# Ginnungagap-Code

Files without an explicit license or source notice (e.g. data and config files, short helper scripts) are licensed under the Creative Commons Attribution-ShareAlike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/. Files with an explicit source notice are copyright their respective notices and are subject to the licenses the copyright holder places upon them. The RAM disk scripts are licensed under the GPL due to liability concerns.

## Visualization

An attempt to aid in visualizing the different metrics in actor space is provided by visualize.py using the metaphor of a rolling ball on a rubber sheet. Output is to EPS for compatibility with Springer's requirements for journal submissions. Requires matplotlib.

## Implementation

The implementation can be found in alday_ms_ibs_neuroinformatics.py. Extensive usage information can be invoked with the --help option. This particular variant of the implementation includes a special case version of Stage 1 of the eADM designed to handle the experimental stimuli of the "Ginnungagap" (previously "Bonde") experiment. Supporting lexographic and stimuli data are found in the .txt files.

For more information, examine the source code or read the paper. :-)

## Supplied Data

### EEG

The supplied data in ggitem_N400P600.tab was retrieved from the Ginnungagap-EEG repository at changeset 33fdf1b16d2c.

The supplied data set has four time windows: two each for N400 (300-500ms; 400--600ms) and Late Positivity/P600 (600-800ms; 700-900s). These were chosen as the standard time windows for this effect and do not reflect any closer analysis of the experiment in question.

### Behavioural Data

The supplied data in behav.tab and rt_by_subj.tab was retrieved from the Ginnungagap_EEG repository at changeset 6e31e7fb9486.

behav.tab includes the entire tabulated behavorial data (reaction time and task accuracy). Single-trial RTs from this data do not work as a fixed-factor in the current models -- the strong correlation/interaction with subject and item groupings leads to a matrix which is not positive definite. Although this data contains RTs from incorrect trials, ggmerge.py handles this without incident since it only matches trials present in the EEG data and the EEG data was prefiltered for incorrect trials.

rt_by_subj.tab includes the averaged reaction time by subject for each condition (i.e. single-subject averages in EEG terminology).

gganova-behavdata.R is an adaptation of behav_anova.Rfrom the Ginnungagap_EEG repository at changeset 6e31e7fb9486.

### Stage 2 Tests

The folders stage2test_localambiguity and stage2tests provide a collection test constructions for Stage 2 and the weights used therein. The former contain data sets with a local case ambiguity at the first NP which is made unambiguous by the case of the second NP. The latter contains all possible combination of unambiguous constructions.

### Sample Constructions

While the Stage 2 Tests are adapted from the learning data sets for another model and include desired state at each step (currently ignored) as well as the final assignment of ±DEP, the file sample_contructions.txt includes just a few simple constructions in their Stage 1 output / Stage 2 input form -- no cross checking or verifying is required. This and the corresponding batch and interactive modes in the implementation are useful for testing how the implementation handles particular constructions in the abstract.

## Statistical Analysis (with R)

*Please note that a number of the libraries used and supplied functions depend on versions of lme4 that still yield mer objects. Newer versions are scheduled to use merMod objects. In the future, it will be necessary to use lme4.0. For more information, see http://lme4.r-forge.r-project.org/. *

The default input files match the supplied data and defaults used in the implementation.

It is recommended that you only run the mixed model scripts on 64-bit systems with more than 4GB of RAM. (On OS X 10.5, this means explictly invoking R64 as the default R is only 32 bit!) Some calculations will not run on 32-bit systems because of the small address space, and all of the scripts are memory bound for large portions of their run time.

### Without Learning

The R script ggmemsimple.R performs simple comparison of the best performing models.

The R script ggmemambiguity.R performs the comparisons, splitting along the condition ambiguity.

ggmem*.R all depend on the output from ggemempreproc.sh (wrapper for ggmempreproc.R [itself dependent on ggmempreproc_common.R], ggmerge.py and trimlines.sh) being available. For machines with lots of RAM (> 4GB free), the option --map parallel (strict parallel execution) is recommended, for machines with multiple processors but less RAM, try --map lazy (lazy parallel execution). The strict serial default can be explicitly invoked with --map serial.

#### Notes on Preprocessing

On account of some inefficiencies in R and in the preprocessing code, preprocessing can take several minutes even on a fast computer. The gains in efficiency added in recent revisions are limited by the increase in data for the parametric evaluation of time windows. To make it easier to experiment with the preprocessed data at a later point and eliminate the need to repeat this costly step, it was separated out.

Occasionally, the following error is produced by ggmerge.py:

Exception RuntimeError: RuntimeError('cannot join current thread',) in <Finalize object, dead> ignored


This does not appear to have any effect -- I believe it comes from the program as a whole terminating following the call to close the thread pool.

The preprocessing code depends on ggmemwin.cfg to set the N400 and P600 time windows differentially for pronouns and nouns, which is basically a small chunk of assignment code executed directly by ggmempreproc.R:

n400.pro="+300..+500"
p600.pro="+600..+800"
n400.noun="+400..+600"
p600.noun="+700..+900"


The = assignment notation is preferred because it is also Python compatible, which will enable better forward compatibility if we move more of the preprocessing to the pymerge.py. The values listed here are the values currently in use.

It is important to note that the sign is not optional and that all numeric values must be left zero padded to the same length. Furthmore, because of restrictions with the sampling rate used in the EEG experiment, avrretrieve returns intervals terminating at 48 and not 50.

Assuming that the implementation output was saved to alday_ms_ibs_neuroinformatics_stimuli.txt.tab (the default), the script ggmempreproc.sh can be used to automate this workflow. Any command line arguments, i.e. --map, are simply passed through to ggmem.py.

#### RAM Diskcache

For Linux and OS X machines with lots of RAM, two helper scripts are provided for using a RAM disk to cache the .tab files.

ramdisk-start.sh creates and clears a RAM disk and moves all .tab files present to it, creating soft links in their original position so that the move is transparent for all scripts. WARNING This will clear any RAM disk at the expected location.

ramdisk-stop.sh deletes the soft links for all the .tab files in the RAM disk and moves the files themselves back to their original position. Failure to run this script before unmounting your RAM disk will result in data loss. You can restore files from the repository and then regenerate the build products from them, but you will lose whatever changes you made as well as the time necessary to run the preprocessing steps.

Because ramdisk-start.sh only moves existing .tab files and clears any existing RAM disk, you will have run ramdisk-stop.sh to save any changes to existing .tab and then ramdisk-start.sh to add any new ones. A helper utility to refresh the disk is planned (or welcomed as a contribution).

#### Figures and Calculations for Publication

A number of helper scripts are provided for preparing figures for publication and exporting them.

makewidetable.sh converts the LaTeX table environment to table*, which is the column-spanning variant in Springer journals. It also calls sigstars.awk

sigstars.awk adds significance stars for ANOVA output.

makefigures.sh produces the output necessary to publication and moves all the relevant files to the paper directory (assumed to be the sister to the code directory) -- it runs ggmempub.R (mixed models), gganova-memdata-R (classical ANOVA for EEG), gganova-behavdata.R(ANOVA for behavioral measures), makewidetable.sh (pretty up the ANOVA outputs), and visualize.py (see above). Afterwards it deletes the .bak intermediaries.

ggmempub.R does the actual mixed model analysis and generates the corresponding graphics and tables. Several command line options are available to speed up runs during testing and development, these can be accessed via ./ggmempub.R -h. Additionally, during interactive sessions you can set the parameters PRINT_MODELS,PRINT_ANOVA,MAKE_FIGS.

ggmempub_init*.R perform the start up and initialization tasks common to all the analyses for publication.

ggmempub_models*.R perform the calculations for each group of models. Please note that these scripts depend on a sensibly initialized environment. test.R is useful for this when running interactively.

test.R is useful during interactive testing. Start the interpreter, and source("test.R"). All the relevant variables will be initialized to sensible default values and all the helper functions will be loaded. This is especially useful for testing the individual ggmempub_models*.R files.

Here we show a sample run for preparing the figures for publication from scratch:

user@localhost ginnungagap-code $./alday_ms_ibs_neuroinformatics.py -e alday_ms_ibs_neuroinformatics_stimuli.txt --apriori Entering batch mode. Writing to alday_ms_ibs_neuroinformatics_stimuli.txt.tab ... Done. user@localhost ginnungagap-code$ ./ggmempreproc.sh --map parallel --procs 3
R CMD BATCH --vanilla ggmempreproc.R
./trimlines.sh ggitem_N400P600.preproc.tab alday_ms_ibs_neuroinformatics_stimuli.txt.tab
./ggmerge.py "$@" ggitem_N400P600.preproc.tab alday_ms_ibs_neuroinformatics_stimuli.txt.tab comparison.data.tab 4 CPUs detected. Using 3 processes. user@localhost ginnungagap-code$ ./makefigures.sh --procs=3

Attaching package: ‘lme4’

The following object is masked from ‘package:stats’:

AIC, BIC

Attaching package: ‘effects’

The following object is masked from ‘package:datasets’:

Titanic

[1] Window: N400
[1] Models without ambiguity
[1] lmer.dist
[1] lmer.sdiff
[1] lmer.signdist
[1] lmer.syndist
[1] lmer.synsdiff
[1] lmer.synsigndist
[1] Models with ambiguity as a predictor
[1] With interaction
[1] lmer.dist.ambiguity
Fontconfig warning: ignoring UTF-8: not a valid region tag
Fontconfig warning: ignoring UTF-8: not a valid region tag
Fontconfig warning: ignoring UTF-8: not a valid region tag
[1] lmer.sdiff.ambiguity
[1] lmer.signdist.ambiguity
[1] lmer.syndist.ambiguity
[1] lmer.synsdiff.ambiguity
[1] lmer.synsigndist.ambiguity

[1] Compared to models without ambiguity

[1] Without interaction
[1] lmer.dist.ambiguity.no_int
[1] lmer.sdiff.ambiguity.no_int
[1] lmer.signdist.ambiguity.no_int
[1] lmer.syndist.ambiguity.no_int
[1] lmer.synsdiff.ambiguity.no_int
[1] lmer.synsigndist.ambiguity.no_int

[1] Compared to models without ambiguity

[1] Ambiguity: With vs. Without interaction

[1] Minimally adequate models
[1] Compared to syntactic models

[1] With interaction and reaction time
[1] lmer.dist.ambiguity.no_int.rt
[1] lmer.sdiff.ambiguity.rt
[1] lmer.signdist.ambiguity.rt

[1] Models divided up by ambiguity

[1] Window: N400: NP1: unambiguous
[1] lmer.dist
[1] lmer.sdiff
[1] lmer.signdist
[1] lmer.syndist
[1] lmer.synsdiff
[1] lmer.synsigndist

[1] Window: N400: NP1: ambiguous
[1] lmer.dist
[1] lmer.sdiff
[1] lmer.signdist
[1] lmer.syndist
[1] lmer.synsdiff
[1] lmer.synsigndist

[1] Window: P600
[1] Models without ambiguity
[1] lmer.dist
[1] lmer.sdiff
[1] lmer.signdist
[1] lmer.syndist
[1] lmer.synsdiff
[1] lmer.synsigndist
[1] Models with ambiguity as a predictor
[1] With interaction
[1] lmer.dist.ambiguity
[1] lmer.sdiff.ambiguity
[1] lmer.signdist.ambiguity
[1] lmer.syndist.ambiguity
[1] lmer.synsdiff.ambiguity
[1] lmer.synsigndist.ambiguity

[1] Compared to models without ambiguity

[1] Without interaction
[1] lmer.dist.ambiguity.no_int
[1] lmer.sdiff.ambiguity.no_int
[1] lmer.signdist.ambiguity.no_int
[1] lmer.syndist.ambiguity.no_int
[1] lmer.synsdiff.ambiguity.no_int
[1] lmer.synsigndist.ambiguity.no_int

[1] Compared to models without ambiguity

[1] Ambiguity: With vs. Without interaction

[1] Minimally adequate models
[1] Compared to syntactic models

[1] With interaction and reaction time
[1] lmer.dist.ambiguity.no_int.rt
[1] lmer.sdiff.ambiguity.rt
[1] lmer.signdist.ambiguity.rt

[1] Models divided up by ambiguity

[1] Window: P600: NP1: unambiguous
[1] lmer.dist
[1] lmer.sdiff
[1] lmer.signdist
[1] lmer.syndist
[1] lmer.synsdiff
[1] lmer.synsigndist

[1] Window: P600: NP1: ambiguous
[1] lmer.dist
[1] lmer.sdiff
[1] lmer.signdist
[1] lmer.syndist
[1] lmer.synsdiff
[1] lmer.synsigndist

There were 50 or more warnings (use warnings() to see the first 50)

Attaching package: ‘lme4’

The following object is masked from ‘package:stats’:

AIC, BIC

This is mgcv 1.7-22. For overview type 'help("mgcv-package")'.

Attaching package: ‘ez’

The following object is masked from ‘package:plyr’:

progress_time

Hmisc library by Frank E Harrell Jr

Type library(help='Hmisc'), ?Overview, or ?Hmisc.Overview')
to see overall documentation.

NOTE:Hmisc no longer redefines [.factor to drop unused levels when
subsetting.  To get the old behavior of Hmisc type dropUnusedLevels().

Attaching package: ‘Hmisc’

The following object is masked from ‘package:survival’:

untangle.specials

The following object is masked from ‘package:xtable’:

label, label<-

The following object is masked from ‘package:plyr’:

is.discrete, summarize

The following object is masked from ‘package:car’:

recode

The following object is masked from ‘package:base’:

format.pval, round.POSIXt, trunc.POSIXt, units

[1] Windows:
[1] N400 P600
[1] ROIs:
[1] Left-Anterior   Left-Posterior  Midline         Right-Anterior
[5] Right-Posterior
[1] anovawin_N400
Warning: Collapsing data to cell means. *IF* the requested effects are a subset of the full design, you must use the "within_full" argument, else results may be inaccurate.
[1]
[1] ROI Left-Anterior
Warning: Collapsing data to cell means. *IF* the requested effects are a subset of the full design, you must use the "within_full" argument, else results may be inaccurate.
[1]
[1] ROI Left-Posterior
Warning: Collapsing data to cell means. *IF* the requested effects are a subset of the full design, you must use the "within_full" argument, else results may be inaccurate.
[1]
[1] ROI Midline
Warning: Collapsing data to cell means. *IF* the requested effects are a subset of the full design, you must use the "within_full" argument, else results may be inaccurate.
[1]
[1] ROI Right-Anterior
Warning: Collapsing data to cell means. *IF* the requested effects are a subset of the full design, you must use the "within_full" argument, else results may be inaccurate.
[1]
[1] ROI Right-Posterior
Warning: Collapsing data to cell means. *IF* the requested effects are a subset of the full design, you must use the "within_full" argument, else results may be inaccurate.
[1] anovawin_P600
Warning: Collapsing data to cell means. *IF* the requested effects are a subset of the full design, you must use the "within_full" argument, else results may be inaccurate.
[1]
[1] ROI Left-Anterior
Warning: Collapsing data to cell means. *IF* the requested effects are a subset of the full design, you must use the "within_full" argument, else results may be inaccurate.
[1]
[1] ROI Left-Posterior
Warning: Collapsing data to cell means. *IF* the requested effects are a subset of the full design, you must use the "within_full" argument, else results may be inaccurate.
[1]
[1] ROI Midline
Warning: Collapsing data to cell means. *IF* the requested effects are a subset of the full design, you must use the "within_full" argument, else results may be inaccurate.
[1]
[1] ROI Right-Anterior
Warning: Collapsing data to cell means. *IF* the requested effects are a subset of the full design, you must use the "within_full" argument, else results may be inaccurate.
[1]
[1] ROI Right-Posterior
Warning: Collapsing data to cell means. *IF* the requested effects are a subset of the full design, you must use the "within_full" argument, else results may be inaccurate.

Attaching package: ‘lme4’

The following object is masked from ‘package:stats’:

AIC, BIC

This is mgcv 1.7-22. For overview type 'help("mgcv-package")'.

Attaching package: ‘ez’

The following object is masked from ‘package:plyr’:

progress_time

Hmisc library by Frank E Harrell Jr

Type library(help='Hmisc'), ?Overview, or ?Hmisc.Overview')
to see overall documentation.

NOTE:Hmisc no longer redefines [.factor to drop unused levels when
subsetting.  To get the old behavior of Hmisc type dropUnusedLevels().

Attaching package: ‘Hmisc’

The following object is masked from ‘package:survival’:

untangle.specials

The following object is masked from ‘package:xtable’:

label, label<-

The following object is masked from ‘package:plyr’:

is.discrete, summarize

The following object is masked from ‘package:car’:

recode

The following object is masked from ‘package:base’:

format.pval, round.POSIXt, trunc.POSIXt, units

Warning: Collapsing data to cell means. *IF* the requested effects are a subset of the full design, you must use the "within_full" argument, else results may be inaccurate.
Warning: Collapsing data to cell means. *IF* the requested effects are a subset of the full design, you must use the "within_full" argument, else results may be inaccurate.
Warning: Collapsing data to cell means. *IF* the requested effects are a subset of the full design, you must use the "within_full" argument, else results may be inaccurate.
Warning: Collapsing data to cell means. *IF* the requested effects are a subset of the full design, you must use the "within_full" argument, else results may be inaccurate.
[1] "Accuracy"
$ANOVA Effect DFn DFd F p p<.05 ges 2 wordOrder 1 36 45.6865658 6.809031e-08 * 0.1275743964 3 ambiguity 1 36 20.3921035 6.528868e-05 * 0.0546878016 4 np1type 1 36 29.4290509 4.081266e-06 * 0.0263856710 5 np2type 1 36 8.7587502 5.420542e-03 * 0.0088499081 6 wordOrder:ambiguity 1 36 20.1477389 7.083665e-05 * 0.0457226845 7 wordOrder:np1type 1 36 0.3579305 5.534036e-01 0.0002276356 8 ambiguity:np1type 1 36 1.1313932 2.945613e-01 0.0006983432 9 wordOrder:np2type 1 36 5.6944462 2.239859e-02 * 0.0083736063 10 ambiguity:np2type 1 36 33.2819118 1.414628e-06 * 0.0226771097 11 np1type:np2type 1 36 0.8249272 3.697838e-01 0.0005006522 12 wordOrder:ambiguity:np1type 1 36 1.7424517 1.951608e-01 0.0006201039 13 wordOrder:ambiguity:np2type 1 36 30.3248283 3.171752e-06 * 0.0255229285 14 wordOrder:np1type:np2type 1 36 4.6394270 3.801524e-02 * 0.0025865827 15 ambiguity:np1type:np2type 1 36 0.3378052 5.647204e-01 0.0001211227 16 wordOrder:ambiguity:np1type:np2type 1 36 0.2839334 5.974089e-01 0.0001305028 [1] "Reaction Time"$ANOVA
Effect DFn DFd           F            p p<.05          ges
2                            wordOrder   1  36 14.18045044 5.937280e-04     * 3.263434e-03
3                            ambiguity   1  36  8.22904929 6.854827e-03     * 7.921217e-04
4                              np1type   1  36  3.62085798 6.508093e-02       3.788348e-04
5                              np2type   1  36 33.55675706 1.314788e-06     * 5.746344e-03
6                  wordOrder:ambiguity   1  36  8.14306505 7.123750e-03     * 6.096586e-04
7                    wordOrder:np1type   1  36  1.40902117 2.429930e-01       1.476509e-04
8                    ambiguity:np1type   1  36  8.45010498 6.212150e-03     * 4.438656e-04
9                    wordOrder:np2type   1  36  9.89103541 3.323163e-03     * 1.552125e-03
10                   ambiguity:np2type   1  36  6.82232626 1.304824e-02     * 4.992799e-04
11                     np1type:np2type   1  36  0.15466182 6.964390e-01       9.707596e-06
12         wordOrder:ambiguity:np1type   1  36  0.30249724 5.857176e-01       2.408821e-05
13         wordOrder:ambiguity:np2type   1  36  4.95165726 3.242000e-02     * 4.085559e-04
14           wordOrder:np1type:np2type   1  36  1.20223465 2.801563e-01       1.336334e-04
15           ambiguity:np1type:np2type   1  36  0.05472668 8.163589e-01       5.267089e-06
16 wordOrder:ambiguity:np1type:np2type   1  36  0.42487180 5.186562e-01       3.461528e-05

[1] "Reaction Time: Ambiguous"
$ANOVA Effect DFn DFd F p p<.05 ges 2 wordOrder 1 36 14.8980816 4.529389e-04 * 6.194222e-03 3 np1type 1 36 0.0211497 8.851831e-01 2.393871e-06 4 np2type 1 36 32.5284868 1.731664e-06 * 8.908190e-03 5 wordOrder:np1type 1 36 0.2423428 6.255080e-01 4.873699e-05 6 wordOrder:np2type 1 36 11.5796076 1.648620e-03 * 3.293272e-03 7 np1type:np2type 1 36 0.2174584 6.437930e-01 2.719067e-05 8 wordOrder:np1type:np2type 1 36 0.1990047 6.581971e-01 2.993453e-05 [1] "Reaction Time: Unambiguous"$ANOVA
Effect DFn DFd            F            p p<.05          ges
2                 wordOrder   1  36  6.562216928 0.0147493779     * 1.144129e-03
3                   np1type   1  36  8.534666765 0.0059836449     * 1.776805e-03
4                   np2type   1  36 14.895627608 0.0004533549     * 3.110311e-03
5         wordOrder:np1type   1  36  1.910506031 0.1754234477       3.151138e-04
6         wordOrder:np2type   1  36  2.143167105 0.1518840598       3.995357e-04
7           np1type:np2type   1  36  0.003672654 0.9520110329       7.294680e-07
8 wordOrder:np1type:np2type   1  36  1.362580113 0.2507645442       3.294634e-04

[1] "A few more things ...."
[1] "object first ambiguous vs subject first ambiguous, np2 = noun"
Warning: Collapsing data to cell means. *IF* the requested effects are a subset of the full design, you must use the "within_full" argument, else results may be inaccurate.
Effect DFn DFd        F            p p<.05        ges
2 wordOrder   1  36 16.76674 0.0002284524     * 0.01668264
[1] mean OAXN: 472.710714285714
[1] mean SAXN: 438.638968880632
[1] "object first ambiguous vs subject first ambiguous, np2 = pronoun"
Warning: Collapsing data to cell means. *IF* the requested effects are a subset of the full design, you must use the "within_full" argument, else results may be inaccurate.
Effect DFn DFd        F         p p<.05          ges
2 wordOrder   1  36 1.550036 0.2211738       0.0005101109
[1] mean OAXP: 428.572572815534
[1] mean SAXP: 426.9079962808
[1] "unambiguous sentences, np1 noun  vs np1 pro"
Warning: Collapsing data to cell means. *IF* the requested effects are a subset of the full design, you must use the "within_full" argument, else results may be inaccurate.
Effect DFn DFd        F           p p<.05         ges
3 np1type   1  36 8.534667 0.005983645     * 0.001776805
[1] mean XUNX: 442.974065309986
[1] mean XUPX: 431.113630073801
Warning: Collapsing data to cell means. *IF* the requested effects are a subset of the full design, you must use the "within_full" argument, else results may be inaccurate.
Warning: Collapsing data to cell means. *IF* the requested effects are a subset of the full design, you must use the "within_full" argument, else results may be inaccurate.
[1] "Accuracy"
wordOrder ambiguity np1type np2type  N      Mean         SD       FLSD
1          O         A       N       N 37 0.8648427 0.13428182 0.01435357
2          O         A       N       P 37 0.9317667 0.08022641 0.01435357
3          O         A       P       N 37 0.8883304 0.11164366 0.01435357
4          O         A       P       P 37 0.9382417 0.05647583 0.01435357
5          O         U       N       N 37 0.9490836 0.06295362 0.01435357
6          O         U       N       P 37 0.9427750 0.04235909 0.01435357
7          O         U       P       N 37 0.9846536 0.02436073 0.01435357
8          O         U       P       P 37 0.9611370 0.01853162 0.01435357
9          S         A       N       N 37 0.9680935 0.03975697 0.01435357
10         S         A       N       P 37 0.9665735 0.03513786 0.01435357
11         S         A       P       N 37 0.9846536 0.02436073 0.01435357
12         S         A       P       P 37 0.9845604 0.03740937 0.01435357
13         S         U       N       N 37 0.9719478 0.03566663 0.01435357
14         S         U       N       P 37 0.9674122 0.04209350 0.01435357
15         S         U       P       N 37 0.9836285 0.03128231 0.01435357
16         S         U       P       P 37 0.9909910 0.01867303 0.01435357
[1] "Reaction Time"
wordOrder ambiguity np1type np2type  N     Mean       SD     FLSD
1          O         A       N       N 37 484.4486 197.4110 16.06999
2          O         A       N       P 37 435.3229 164.6778 16.06999
3          O         A       P       N 37 482.6158 184.6790 16.06999
4          O         A       P       P 37 433.6556 157.2673 16.06999
5          O         U       N       N 37 465.4350 167.9132 16.06999
6          O         U       N       P 37 437.6552 155.4856 16.06999
7          O         U       P       N 37 442.3987 161.1639 16.06999
8          O         U       P       P 37 424.9375 142.2733 16.06999
9          S         A       N       N 37 436.5023 147.1485 16.06999
10         S         A       N       P 37 427.9235 146.3425 16.06999
11         S         A       P       N 37 442.6901 149.0355 16.06999
12         S         A       P       P 37 427.2288 151.5664 16.06999
13         S         U       N       N 37 438.6704 151.1276 16.06999
14         S         U       N       P 37 433.6434 149.1170 16.06999
15         S         U       P       N 37 437.0519 151.3104 16.06999
16         S         U       P       P 37 420.6872 127.5365 16.06999
@Manual{lme4,
title = {lme4: Linear mixed-effects models using S4 classes},
author = {Douglas Bates and Martin Maechler and Ben Bolker},
year = {2013},
note = {R package version 0.999999-2},
url = {http://CRAN.R-project.org/package=lme4},
}


Alternatively, you can just run publish.sh (with the optional --map and --procs arguments).

### With Learning

Because the lme4 package uses a form of likelihood maximization for its fitting, we can use its output for adjusting the weights of different features in the implementation. However, the experiment for the supplied data does not manipulate the same features as are supposed in the eADM and so we have to be a bit creative: we can only acquire the weights for animacy, case, definiteness and person (person correlates very strongly with position!). We do this by first calculating the mixed effects model as given by lmer, and then examining the fixed effects portion:

fix.lmer.imp.learn  <- fixef(lmer.imp.learn)
print(fix.lmer.imp.learn)


This gives us a list of the columns of the matrix in the order they are stored in. Using the positions of the factors animacy, case, definiteness and person, we can figure out which values they correspond to from the main diagonal of the variance-covariance matrix:

print(diag(vcov(lmer.imp.learn)))


We can then input these into the model using the command line options for weights (which takes exactly these values, which are actually the inverse of the weights).

ggmem_learn.R performs these steps and outputs those values for the "lower half" of the data set (ie the data for test subject numbers <= 20). The models are fitted only for the N400 time window.

ggmem_test.R performs the myriad mixed model analysis of ggmem.R but only on the "upper half" of the data set (ie the data for test subject numbers > 20). You need to rerun the implementation with the weight corrections beforehand.

Like ggmem.R, both ggmem_learn.R and ggmem_test.R depend on a preprocessing step: ggmempreproc_learn.R and ggmempreproc_test.R, respectively.

The scripts ggmem*_test.R perform the same actions as the corresponding scripts without _test.R (indeed they invoke a common core), but only perform the analysis on the upper half of the test subjects.

To make this workflow run more smoothly, there a few helper scripts, ggmemlearn.sh and ggmemtest.sh.
1. The standard file output from the implementation (alday_ms_ibs_neuroinformatics_stimuli.txt.tab) is used by ggmemlearn.sh to run preprocessing on half of the data set and then display the fixed effects and main diagonal of the corresponding variance-covariance matrix.
2. With the weights collected from this output, we rerun the implementation with the output file weights.test.tab.
3. Then, we run ggmemtest.sh which runs preprocessing on the test data as well as ggmem_test.R

Here, we show a sample run:

user@localhost ginnungagap-code $./ggmemlearn.sh --map lazy R CMD BATCH --vanilla ggmempreproc_learn.R ./trimlines.sh ggitem_N400P600.preproc.learn.tab alday_ms_ibs_neuroinformatics_stimuli.txt.tab ./ggmerge.py "$@" ggitem_N400P600.preproc.learn.tab alday_ms_ibs_neuroinformatics_stimuli.txt.tab comparison.data.learn.tab
4 CPUs detected. Using 2 processes.
R --vanilla --slave < ggmem_learn.R

Attaching package: ‘lme4’

The following object(s) are masked from ‘package:stats’:

AIC, BIC

[1] win N400
(Intercept)      animacy         case definiteness       person
1.081753113 -0.008519129 -0.122723624  0.036486532  0.560183210
[1] 0.0525755387 0.0665219138 0.0002054454 0.0652915333 0.0673415503

user@localhost ginnungagap-code $./alday_ms_ibs_neuroinformatics.py -e alday_ms_ibs_neuroinformatics_stimuli.txt weights.test.tab --apriori --animacy 0.0665219138 --case 0.0002054454 --definiteness 0.0652915333 --person 0.0673415503 Entering batch mode. Writing to weights.test.tab ... Done. user@localhost ginnungagap-code$ ./ggmemtest.sh --map lazy
R CMD BATCH --vanilla ggmempreproc_test.R
./trimlines.sh ggitem_N400P600.preproc.learn.tab weights.test.tab
./ggmerge.py "\$@" ggitem_N400P600.preproc.learn.tab weights.test.tab comparison.data.test.tab
4 CPUs detected. Using 2 processes.
R --vanilla --slave < ggmem_test.R

Attaching package: ‘lme4’

The following object(s) are masked from ‘package:stats’:

AIC, BIC

[1] win: N400
Data: eeg.item.data.win
Models:
lmer.imp.dist: mean ~ dist + (1 | item) + (1 | subj)
lmer.imp.sdiff: mean ~ sdiff + (1 | item) + (1 | subj)
Df    AIC    BIC logLik Chisq Chi Df Pr(>Chisq)
lmer.imp.dist   5 194870 194912 -97430
lmer.imp.sdiff  5 194883 194925 -97437     0      0          1