Wiki

Clone wiki

PDPtool3.0 / Pattern Parser Documentation

PDPtool Example File Format

An example file is the text file (extension .pat) that defines the stimuli for the network. The type of stimuli appropriate depends on the type of network utilized, but all network types depicted in the handbook use stimuli that can be specified by the following example file format. Before delving into the grammar used to parse these files into spatially and temporally localized patterns of activation, lets discuss the basic concepts formalized in the parser.

Time Granularities

Example files can specify information to be used at different granularities. The pattern level specified activations to apply to a pool of units. The event level specifies a set of patterns thought of as occurring together as a single event. The episode level encapsulates one or more events that make up a semantically distinct example. In other words, if a sequence of events wouldn’t make sense when presented out of order, then they should be defined as a single episode.

For many neural network models, each episode can be thought of as consisting of a single event. For example, backprop networks are generally trained with episodes each consisting of a single event, itself consisting of an input pattern and a target pattern.

Text Format

The file format is shown below. Whitespace only matters when leaving it out would cause ambiguity or cause two numbers or strings to run together. The line breaks however, are required and if any additional line breaks are ever desired for readability, then one should use an ellipsis (i.e. “...”) as in standard Matlab. In the format specification, elements in angled brackets denote values of specific types. denotes an integer, <R > denotes a real number, and denotes a string. A single string cannot contain any whitespace. All fields are optional except for <R activations>, and their usage and default behavior are explained after the format specification. Text starting with MATLAB comment indicators % is just explanatory and does not appear. The <R activation>, parentheses, and end must be included in the syntax.

%for each episode:
<S episode-name> <R weight>
       % for each time step where activation clamping is initialized: 
      [<S event-name> <I time-onset>] (<S clamp-type> <I duration> <S pool-name>) <R activations> 
      % if more pools are clamped at the given time onset, then separate them with a pipe (i.e. “|”)
end

Field Usage and Default Behavior

As discussed above, nearly all of the fields shown here are optional, but when they can be omitted and what affect that has on the stimuli specified requires additional explanation. Below each field is explained in terms of usage dependencies followed by a pipe (i.e. “|”) and the default behavior if omitted.

Field name Field special requirements Default values
an end line is required all episodes are one event long with numeric names
<R weight> requires declaring the episode name defaults to a value of one
defaults to a numeric name
assumed to be unique and ordered defaults to the previous events onset + 1
value must be H (hard clamp), S (soft clamp), or T (target clamp). first pool in event defaults to H, the rest to T
defaults to a value of one
must be identical to the pool’s name in your network first pool of correct type
<R activation> values must be separated by whitespace required field

Lexicon Files and Special Cases

Some projects might involve patterns of activation best represented by symbols, such as in phonology where words stand in for their sonic properties. To accommodate such projects, a lexicon text file (.lex) can be created that specifies a list of symbols and the patterns of activation they stand for. Each symbol has its own line with whitespace separating the symbol name and activation definition.

When using a lexicon file, you may write the symbol name in your example file wherever a <R activations> field arises, and the file environment will automatically replace the symbol name with the appropriate pattern of activation.
Some projects, such as the SRN project in the PDPtool handbook, require more than simple symbol substitution. In general, PDPtool doesn’t accommodate such cases and assumes the user preprocesses the data to get it into the appropriate format. However, for the SRN project there is a special mechanism in place that reads in the list of symbols and translates it into input-output pairs.


Author: Rachel Lee, Steven Hansen

Updated