Commits

Yoav Artzi committed 4700157

# v1.4.1
- Cleaned up using of generics throughout the system. Generic classes should now compatible
- Better logging system - all logs are now public and can be controlled from outside SPF without editing SPF's code
- Javadocs for many generic classes
- Better abstraction in ExPlat's experiment hierarchy
- Better representations of situated data items

Comments (0)

Files changed (95)

-# [_**UW SPF v1.4**_](http://yoavartzi.com/spf) - The University of Washington Semantic Parsing Framework v1.4
+# [_**UW SPF v1.4.1**_](http://yoavartzi.com/spf) - The University of Washington Semantic Parsing Framework v1.4.1
 
 **Developed and maintained by** [Yoav Artzi](http://yoavartzi.com)
 
 **Contributors:** [Luke Zettlemoyer](http://homes.cs.washington.edu/~lsz/), [Tom Kwiatkowski](http://homes.cs.washington.edu/~tomk/)
 
-## Projects Using SPF
-
-[Navi](http://yoavartzi.com/navi) 
-
-## Documentation
+## Documentations
 
 More coming soon … 
 
 In the meantime, see the ACL 2013 tutorial for general information about semantic parsing with CCGs. The slides are available [here](http://yoavartzi.com).
 
-### Running example experiments
-
-The framework contains an example experiment using the GeoQuery corpus. To use development fold zero for testing, and training on the other sets, use:
-``java -jar dist/spf-1.4.jar geoquery/experiments/template/dev.cross/dev.fold0.exp``  
-The log and output files are written to the experiment directory:
-``geoquery/experiments/template/dev.cross/``
-
-You can look at the .exp file and see how it defines arguments and how it includes them from other files. Another critical point of entry is the class ``edu.uw.cs.lil.tiny.geoquery.GeoMain``.
-
-### Working with the Code
-
-The code is divided into many projects that have dependencies between them. You can work with the code with any editor and build  it with the accompanying ANT script. However, we recommend using Eclipse. Each of the directories is an Eclipse project and can easily imported into Eclipse. To do so select Import from the File menu and choose "Existing Projects into Workspace". The "Root Directory" should be the code directory and all projects should be selected by default. The dependencies will be imported automatically. To successfully build SPF in Eclipse you will need to set the classpath variable TINY_REPO to the code directory. To so go to Preferences -> Java -> Build Path -> Classpath Variables, add a new variable with the name TINY_REPO and a folder value that points to the code location. 
-
-## Building
+# Building
 
 To compile SPF use: `ant dist`. The output JAR file will be in the `dist` directory. You can also download the compiled JAR file from the [Downloads](https://bitbucket.org/yoavartzi/spf/downloads) section.
 
 
 Artzi, Yoav and Zettlemoyer, Luke. "UW SPF: The University of Washington Semantic Parsing Framework." http://yoavartzi.com/spf.  2013.
 
-[**Bibtex:**](http://yoavartzi.com/pub/az-spf.2013.bib)
+**Bibtex:**
 
     @article{artzi2013uwspf,
         title={{UW SPF: The University of Washington Semantic Parsing Framework}},

VERSION_HISTORY.md

+# v1.4.1
+- Cleaned up using of generics throughout the system. Generic classes should now compatible
+- Better logging system - all logs are now public and can be controlled from outside SPF without editing SPF's code
+- Javadocs for many generic classes
+- Better abstraction in ExPlat's experiment hierarchy
+- Better representations of situated data items
 src.ccg.lexicon=ccg.lexicon/src
 src.ccg.lexicon.factored.lambda=ccg.lexicon.factored.lambda/src
 src.data=data/src
+src.data.situated=data.situated/src
 src.datasinglesentence=data.singlesentence/src
 src.explat=explat/src
 src.learn=learn/src
 src.learn.validation=learn.validation/src
 src.learn.situated=learn.situated/src
-src.learn.weakp=learn.weakp/src
 src.mr.lambda=mr.lambda/src
 src.mr.lambda.ccg=mr.lambda.ccg/src
 src.mr.lambda.exec.naive=mr.lambda.exec.naive/src
     </description>
 	<!-- set global properties for this build -->
 	<property file="build.properties" />
-	<property name="version" value="1.4" />
+	<property name="version" value="1.4.1" />
 	<property name="build" location="build" />
 	<property name="build.src" location="build.src" />
 	<property name="dist" location="dist" />
 			<fileset dir="${src.ccg.lexicon}" includes="**/*.java" />
 			<fileset dir="${src.ccg.lexicon.factored.lambda}" includes="**/*.java" />
 			<fileset dir="${src.data}" includes="**/*.java" />
+			<fileset dir="${src.data.situated}" includes="**/*.java" />
 			<fileset dir="${src.genlex.ccg}" includes="**/*.java" />
 			<fileset dir="${src.genlex.ccg.template}" includes="**/*.java" />
 			<fileset dir="${src.genlex.ccg.unification}" includes="**/*.java" />
 			<fileset dir="${src.explat}" includes="**/*.java" />
 			<fileset dir="${src.learn}" includes="**/*.java" />
 			<fileset dir="${src.learn.validation}" includes="**/*.java" />
-			<fileset dir="${src.learn.weakp}" includes="**/*.java" />
 			<fileset dir="${src.learn.simple}" includes="**/*.java" />
 			<fileset dir="${src.learn.situated}" includes="**/*.java" />
 			<fileset dir="${src.mr.lambda}" includes="**/*.java" />

data.singlesentence/src/edu/uw/cs/lil/tiny/data/singlesentence/SingleSentence.java

 	
 	@Override
 	public String toString() {
-		return new StringBuilder(sentence.toString()).append('\n')
+		return new StringBuilder(super.toString()).append('\n')
 				.append(semantics).toString();
 	}
 	

data.situated/.classpath

+<?xml version="1.0" encoding="UTF-8"?>
+<classpath>
+	<classpathentry kind="src" path="src"/>
+	<classpathentry kind="con" path="org.eclipse.jdt.launching.JRE_CONTAINER"/>
+	<classpathentry kind="src" path="/data"/>
+	<classpathentry kind="src" path="/tinyutils"/>
+	<classpathentry kind="src" path="/javautils"/>
+	<classpathentry kind="output" path="bin"/>
+</classpath>

data.situated/.project

+<?xml version="1.0" encoding="UTF-8"?>
+<projectDescription>
+	<name>data.situated</name>
+	<comment></comment>
+	<projects>
+	</projects>
+	<buildSpec>
+		<buildCommand>
+			<name>org.eclipse.jdt.core.javabuilder</name>
+			<arguments>
+			</arguments>
+		</buildCommand>
+	</buildSpec>
+	<natures>
+		<nature>org.eclipse.jdt.core.javanature</nature>
+	</natures>
+</projectDescription>

data.situated/src/edu/uw/cs/lil/tiny/data/situated/ISituatedDataItem.java

+package edu.uw.cs.lil.tiny.data.situated;
+
+import edu.uw.cs.lil.tiny.data.IDataItem;
+import edu.uw.cs.utils.composites.Pair;
+
+/**
+ * Data item for language in a situated environment.
+ * 
+ * @author Yoav Artzi
+ * @param <LANG>
+ *            Type of language.
+ * @param <STATE>
+ *            Situated state.
+ */
+public interface ISituatedDataItem<LANG, STATE> extends
+		IDataItem<Pair<LANG, STATE>> {
+	
+}

data.situated/src/edu/uw/cs/lil/tiny/data/situated/sentence/SituatedSentence.java

+package edu.uw.cs.lil.tiny.data.situated.sentence;
+
+import java.util.List;
+
+import edu.uw.cs.lil.tiny.data.sentence.Sentence;
+import edu.uw.cs.lil.tiny.data.situated.ISituatedDataItem;
+import edu.uw.cs.utils.composites.Pair;
+
+/**
+ * A sentence situated in some kind of state.
+ * 
+ * @author Yoav Artzi
+ * @param <STATE>
+ *            Type of state/situation.
+ */
+public class SituatedSentence<STATE> implements
+		ISituatedDataItem<Sentence, STATE> {
+	
+	private final Pair<Sentence, STATE>	sample;
+	private final Sentence				sentence;
+	private final STATE					state;
+	
+	public SituatedSentence(Sentence sentence, STATE state) {
+		this.sentence = sentence;
+		this.state = state;
+		this.sample = Pair.of(sentence, state);
+	}
+	
+	@Override
+	public Pair<Sentence, STATE> getSample() {
+		return sample;
+	}
+	
+	public List<String> getTokens() {
+		return sentence.getTokens();
+	}
+	
+	@Override
+	public String toString() {
+		return sentence.toString() + " :: " + state.toString();
+	}
+	
+}

data/src/edu/uw/cs/lil/tiny/data/ILabeledDataItem.java

  * @see IDataItem
  */
 public interface ILabeledDataItem<SAMPLE, LABEL> extends
-		ILossDataItem<SAMPLE, LABEL> {
+		ILossDataItem<SAMPLE, LABEL>, IDataItem<SAMPLE> {
 	
 	LABEL getLabel();
 	
 	 * Compares a label to the gold standard if such exist.
 	 * 
 	 * @param label
-	 * @return null if not gold standard exists.
 	 */
 	boolean isCorrect(LABEL label);
 }

data/src/edu/uw/cs/lil/tiny/data/sentence/Sentence.java

 	private final List<String>	tokens;
 	
 	public Sentence(List<String> tokens) {
-		this.tokens = ListUtils.map(tokens,
+		// Escpae "%" characters, to avoid problems with logging and printing.
+		this.tokens = Collections.unmodifiableList(ListUtils.map(tokens,
 				new ListUtils.Mapper<String, String>() {
 					
 					@Override
 					public String process(String obj) {
 						return obj.replace("%", "%%");
 					}
-				});
+				}));
 		this.string = ListUtils.join(this.tokens, " ");
 	}
 	
 	public Sentence(String string) {
+		// Escpae "%" characters, to avoid problems with logging and printing.
 		this.string = string.replace("%", "%%");
 		this.tokens = Collections.unmodifiableList(tokenize(this.string));
 	}

explat/src/edu/uw/cs/lil/tiny/explat/DistributedExperiment.java

  * 
  * @author Yoav Artzi
  */
-public abstract class DistributedExperiment extends ParameterizedExperiment
-		implements IJobListener, ITinyExecutor {
-	private static final ILogger		LOG						= LoggerFactory
+public abstract class DistributedExperiment extends LoggedExperiment implements
+		IJobListener, ITinyExecutor {
+	public static final ILogger		LOG						= LoggerFactory
 																		.create(DistributedExperiment.class);
 	private final Set<String>			completedIds			= new HashSet<String>();
 	final private Object				completionSignalObject	= new Object();
 	
 	private boolean						running					= true;
 	
+	/** Run one job at a time. */
+	private final boolean				serial;
+	
 	private final long					startingTime			= System.currentTimeMillis();
 	
 	public DistributedExperiment(File initFile) throws IOException {
 		super(initFile, envParams);
 		
 		// //////////////////////////////////////////
+		// Set the serial flag
+		// //////////////////////////////////////////
+		this.serial = globalParams.getAsBoolean("serial");
+		
+		// //////////////////////////////////////////
 		// Create the executor
 		// //////////////////////////////////////////
 		this.executor = new TinyExecutorService(
 									.getDependencyIds())) {
 						executor.execute(queuedJob);
 						launchedIds.add(queuedJob.getId());
+						if (serial) {
+							break;
+						}
 					}
 				}
 			}
 				if (job.getDependencyIds().isEmpty()) {
 					executor.execute(job);
 					launchedIds.add(job.getId());
+					if (serial) {
+						break;
+					}
 				}
 			}
 		}

explat/src/edu/uw/cs/lil/tiny/explat/LoggedExperiment.java

+/*******************************************************************************
+ * UW SPF - The University of Washington Semantic Parsing Framework
+ * <p>
+ * Copyright (C) 2013 Yoav Artzi
+ * <p>
+ * This program is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU General Public License as published by the Free Software
+ * Foundation; either version 2 of the License, or any later version.
+ * <p>
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+ * FOR A PARTICULAR PURPOSE. See the GNU General Public License for more
+ * details.
+ * <p>
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
+ ******************************************************************************/
+package edu.uw.cs.lil.tiny.explat;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.Map;
+
+import edu.uw.cs.utils.assertion.Assert;
+import edu.uw.cs.utils.composites.Pair;
+import edu.uw.cs.utils.log.ILogger;
+import edu.uw.cs.utils.log.Log;
+import edu.uw.cs.utils.log.LogLevel;
+import edu.uw.cs.utils.log.Logger;
+import edu.uw.cs.utils.log.LoggerFactory;
+
+/**
+ * Adds basic logging setup over {@link ParameterizedExperiment}.
+ * 
+ * @author Yoav Artzi
+ */
+public abstract class LoggedExperiment extends ParameterizedExperiment {
+	public static final ILogger	LOG	= LoggerFactory
+												.create(LoggedExperiment.class);
+	protected final File			outputDir;
+	
+	public LoggedExperiment(File file, Map<String, String> envParams)
+			throws IOException {
+		super(file, envParams);
+		
+		// TODO [yoav] Find a place to close the default log, if opened a
+		// stream for it
+		
+		// Output directory
+		this.outputDir = globalParams.contains("outputDir") ? globalParams
+				.getAsFile("outputDir") : null;
+		Assert.ifNull(outputDir);
+		// Create the directory, just to be on the safe side
+		outputDir.mkdir();
+		
+		// Init logging and output stream
+		final File globalLogFile = globalParams.contains("globalLog") ? globalParams
+				.getAsFile("globalLog") : null;
+		if (globalLogFile == null) {
+			Logger.DEFAULT_LOG = new Log(System.err);
+		} else {
+			Logger.DEFAULT_LOG = new Log(globalLogFile);
+		}
+		Logger.setSkipPrefix(true);
+		LogLevel.setLogLevel(LogLevel.INFO);
+		
+		// Log global parameters
+		LOG.info("Parameters:");
+		for (final Pair<String, String> param : globalParams) {
+			LOG.info("%s=%s", param.first(), param.second());
+		}
+		
+	}
+	
+}

explat/src/edu/uw/cs/lil/tiny/explat/ParameterizedExperiment.java

 import jregex.Replacer;
 import jregex.Substitution;
 import jregex.TextBuffer;
-import edu.uw.cs.utils.assertion.Assert;
 import edu.uw.cs.utils.collections.ListUtils;
 import edu.uw.cs.utils.composites.Pair;
-import edu.uw.cs.utils.log.ILogger;
-import edu.uw.cs.utils.log.Log;
-import edu.uw.cs.utils.log.LogLevel;
-import edu.uw.cs.utils.log.Logger;
-import edu.uw.cs.utils.log.LoggerFactory;
 
 public abstract class ParameterizedExperiment implements IResourceRepository {
 	
 	private static final String			INCLUDE_DIRECTIVE			= "include";
 	private static final Pattern		LINE_REPEAT_PATTERN			= new Pattern(
 																			"\\[({var}\\w+)=({start}\\d+)-({end}\\d+)\\]\\s+({rest}.+)$");
-	private static final ILogger		LOG							= LoggerFactory
-																			.create(ParameterizedExperiment.class);
 	private static final Pattern		VAR_REF						= new Pattern(
 																			"%\\{({var}[\\w@]+)\\}");
 	private final Map<String, Object>	resources					= new HashMap<String, Object>();
 	protected final Parameters			globalParams;
 	protected final List<Parameters>	jobParams;
 	
-	protected final File				outputDir;
-	
 	protected final List<Parameters>	resourceParams;
 	
 	public ParameterizedExperiment(File file) throws IOException {
 						.put(entry.getKey(), entry.getValue());
 			}
 			
-			// TODO [yoav] Find a place to close the default log, if opened a
-			// stream for it
-			
-			// Output directory
-			this.outputDir = globalParams.contains("outputDir") ? globalParams
-					.getAsFile("outputDir") : null;
-			Assert.ifNull(outputDir);
-			// Create the directory, just to be on the safe side
-			outputDir.mkdir();
-			
-			// Init logging and output stream
-			final File globalLogFile = globalParams.contains("globalLog") ? globalParams
-					.getAsFile("globalLog") : null;
-			if (globalLogFile == null) {
-				Logger.DEFAULT_LOG = new Log(System.err);
-			} else {
-				Logger.DEFAULT_LOG = new Log(globalLogFile);
-			}
-			Logger.setSkipPrefix(true);
-			LogLevel.setLogLevel(LogLevel.INFO);
-			
-			// Log global parameters
-			LOG.info("Parameters:");
-			for (final Pair<String, String> param : globalParams) {
-				LOG.info("%s=%s", param.first(), param.second());
-			}
-			
 			// Read resources
 			this.resourceParams = readSectionLines(reader);
 			
 			// Read jobs
 			this.jobParams = readSectionLines(reader);
+			
 		} finally {
 			reader.close();
 		}

genlex.ccg.template/.classpath

 	<classpathentry kind="src" path="/parser.ccg"/>
 	<classpathentry kind="src" path="/tinyutils"/>
 	<classpathentry combineaccessrules="false" kind="src" path="/explat"/>
+	<classpathentry kind="src" path="/data.singlesentence"/>
 	<classpathentry kind="output" path="bin"/>
 </classpath>

genlex.ccg.template/src/edu/uw/cs/lil/tiny/genlex/ccg/template/TemplateGenlex.java

  * 
  * @author Yoav Artzi
  */
-public class TemplateGenlex
+public class TemplateGenlex<DI extends Sentence>
 		implements
-		ILexiconGenerator<Sentence, LogicalExpression, IModelImmutable<Sentence, LogicalExpression>> {
-	private static final ILogger				LOG	= LoggerFactory
+		ILexiconGenerator<DI, LogicalExpression, IModelImmutable<Sentence, LogicalExpression>> {
+	public static final ILogger					LOG	= LoggerFactory
 															.create(TemplateGenlex.class);
 	
 	private final int							maxTokens;
 	}
 	
 	@Override
-	public ILexicon<LogicalExpression> generate(Sentence dataItem,
+	public ILexicon<LogicalExpression> generate(DI dataItem,
 			IModelImmutable<Sentence, LogicalExpression> model,
 			ICategoryServices<LogicalExpression> categoryServices) {
 		final List<String> tokens = dataItem.getTokens();
 				ILexiconGenerator.GENLEX_LEXICAL_ORIGIN);
 	}
 	
-	public static class Builder {
+	public static class Builder<DI extends Sentence> {
 		protected final Set<LogicalConstant>	constants	= new HashSet<LogicalConstant>();
 		protected final int						maxTokens;
 		protected final Set<LexicalTemplate>	templates	= new HashSet<LexicalTemplate>();
 			this.maxTokens = maxTokens;
 		}
 		
-		public Builder addConstants(Iterable<LogicalConstant> constantCollection) {
+		public Builder<DI> addConstants(
+				Iterable<LogicalConstant> constantCollection) {
 			for (final LogicalConstant constant : constantCollection) {
 				constants.add(constant);
 			}
 			return this;
 		}
 		
-		public Builder addTemplate(LexicalTemplate template) {
+		public Builder<DI> addTemplate(LexicalTemplate template) {
 			templates.add(template);
 			return this;
 		}
 		
-		public Builder addTemplates(Iterable<LexicalTemplate> templateCollection) {
+		public Builder<DI> addTemplates(
+				Iterable<LexicalTemplate> templateCollection) {
 			for (final LexicalTemplate template : templateCollection) {
 				addTemplate(template);
 			}
 			return this;
 		}
 		
-		public Builder addTemplatesFromLexicon(
+		public Builder<DI> addTemplatesFromLexicon(
 				ILexicon<LogicalExpression> lexicon) {
 			final Collection<LexicalEntry<LogicalExpression>> lexicalEntries = lexicon
 					.toCollection();
 			return this;
 		}
 		
-		public Builder addTemplatesFromModel(
+		public Builder<DI> addTemplatesFromModel(
 				IModelImmutable<?, LogicalExpression> sourceModel) {
 			final Collection<LexicalEntry<LogicalExpression>> lexicalEntries = sourceModel
 					.getLexicon().toCollection();
 			return this;
 		}
 		
-		public TemplateGenlex build() {
-			return new TemplateGenlex(templates, createPotentialLists(),
+		public TemplateGenlex<DI> build() {
+			return new TemplateGenlex<DI>(templates, createPotentialLists(),
 					maxTokens);
 		}
 		

genlex.ccg.template/src/edu/uw/cs/lil/tiny/genlex/ccg/template/TemplateSupervisedGenlex.java

 import edu.uw.cs.lil.tiny.ccg.lexicon.factored.lambda.FactoredLexicon.FactoredLexicalEntry;
 import edu.uw.cs.lil.tiny.ccg.lexicon.factored.lambda.Lexeme;
 import edu.uw.cs.lil.tiny.ccg.lexicon.factored.lambda.LexicalTemplate;
-import edu.uw.cs.lil.tiny.data.ILabeledDataItem;
 import edu.uw.cs.lil.tiny.data.sentence.Sentence;
+import edu.uw.cs.lil.tiny.data.singlesentence.SingleSentence;
 import edu.uw.cs.lil.tiny.genlex.ccg.ILexiconGenerator;
 import edu.uw.cs.lil.tiny.mr.lambda.LogicLanguageServices;
 import edu.uw.cs.lil.tiny.mr.lambda.LogicalConstant;
  * 
  * @author Yoav Artzi
  */
-public class TemplateSupervisedGenlex
+public class TemplateSupervisedGenlex<DI extends SingleSentence>
 		implements
-		ILexiconGenerator<ILabeledDataItem<Sentence, LogicalExpression>, LogicalExpression, IModelImmutable<Sentence, LogicalExpression>> {
+		ILexiconGenerator<DI, LogicalExpression, IModelImmutable<Sentence, LogicalExpression>> {
+	public static final ILogger					LOG			= LoggerFactory
+																	.create(TemplateSupervisedGenlex.class);
+	
 	private static final List<LogicalConstant>	EMPTY_LIST	= Collections
 																	.emptyList();
-	
-	private static final ILogger				LOG			= LoggerFactory
-																	.create(TemplateSupervisedGenlex.class);
 	private final int							maxTokens;
 	private final Set<LexicalTemplate>			templates;
 	private final Set<List<Type>>				typeSignatures;
 	}
 	
 	@Override
-	public ILexicon<LogicalExpression> generate(
-			ILabeledDataItem<Sentence, LogicalExpression> dataItem,
+	public ILexicon<LogicalExpression> generate(DI dataItem,
 			IModelImmutable<Sentence, LogicalExpression> model,
 			ICategoryServices<LogicalExpression> categoryServices) {
 		final List<String> tokens = dataItem.getSample().getTokens();
 				ILexiconGenerator.GENLEX_LEXICAL_ORIGIN);
 	}
 	
-	public static class Builder {
+	public static class Builder<DI extends SingleSentence> {
 		protected final int						maxTokens;
 		protected final Set<LexicalTemplate>	templates	= new HashSet<LexicalTemplate>();
 		
 			this.maxTokens = maxTokens;
 		}
 		
-		public Builder addTemplate(LexicalTemplate template) {
+		public Builder<DI> addTemplate(LexicalTemplate template) {
 			templates.add(template);
 			return this;
 		}
 		
-		public Builder addTemplates(Iterable<LexicalTemplate> templateCollection) {
+		public Builder<DI> addTemplates(
+				Iterable<LexicalTemplate> templateCollection) {
 			for (final LexicalTemplate template : templateCollection) {
 				addTemplate(template);
 			}
 			return this;
 		}
 		
-		public Builder addTemplatesFromLexicon(
+		public Builder<DI> addTemplatesFromLexicon(
 				ILexicon<LogicalExpression> lexicon) {
 			final Collection<LexicalEntry<LogicalExpression>> lexicalEntries = lexicon
 					.toCollection();
 			return this;
 		}
 		
-		public Builder addTemplatesFromModel(
+		public Builder<DI> addTemplatesFromModel(
 				IModelImmutable<?, LogicalExpression> sourceModel) {
 			final Collection<LexicalEntry<LogicalExpression>> lexicalEntries = sourceModel
 					.getLexicon().toCollection();
 			return this;
 		}
 		
-		public TemplateSupervisedGenlex build() {
-			return new TemplateSupervisedGenlex(maxTokens, templates);
+		public TemplateSupervisedGenlex<DI> build() {
+			return new TemplateSupervisedGenlex<DI>(maxTokens, templates);
 		}
 		
 	}

genlex.ccg.template/src/edu/uw/cs/lil/tiny/genlex/ccg/template/coarse/TemplateCoarseGenlex.java

 
 /**
  * Lexicon generator that uses a parser to do initial filtering of generated
- * lexical entries. The generation process is based on abstract lexical entries,
- * which is basically a set of all templates initialized with abstract
- * constants. Abstract constants have the most basic types only. The generation
- * process starts with generating all lexemes for the input sentence using
- * abstract constants. The set of abstract lexemes is combined into a factored
- * lexicon with all templates. Then the sentence is parsed using the current
- * model with the abstract temporary lexicon. This parse is not accurate
- * according to the model, since the model is unfamiliar the any of the abstract
- * constants and can't generate features over them. From this approximate parse,
- * all GENLEX entries that participate in complete parses are collected. Using
- * their tokens and templates, a new lexicon is generated using all possible
- * constants (from the ontology). This lexicon is returned.
+ * lexical entries. The generation process is based on under-specified lexical
+ * entries, which is basically a set of all templates initialized with
+ * under-specified constants. Under-specified constants have the most basic
+ * types only. The generation process starts with generating all lexemes for the
+ * input sentence using under-specified constants. The set of under-specified
+ * lexemes is combined into a factored lexicon with all templates. Then the
+ * sentence is parsed using the current model with the under-specified temporary
+ * lexicon. This parse is not accurate according to the model, since the model
+ * is unfamiliar the any of the under-specified constants and can't generate
+ * features over them. From this approximate parse, all GENLEX entries that
+ * participate in complete parses are collected. Using their tokens and
+ * templates, a new lexicon is generated using all possible constants (from the
+ * ontology). This lexicon is returned.
  * 
  * @author Yoav Artzi
+ * @param <DI>
+ *            Data item for generation.
  */
-public class TemplateCoarseGenlex
+public class TemplateCoarseGenlex<DI extends Sentence>
 		implements
-		ILexiconGenerator<Sentence, LogicalExpression, IModelImmutable<Sentence, LogicalExpression>> {
-	private static final ILogger								LOG	= LoggerFactory
+		ILexiconGenerator<DI, LogicalExpression, IModelImmutable<Sentence, LogicalExpression>> {
+	public static final ILogger									LOG	= LoggerFactory
 																			.create(TemplateCoarseGenlex.class);
 	
 	private final Set<List<LogicalConstant>>					abstractConstantSeqs;
 	}
 	
 	@Override
-	public ILexicon<LogicalExpression> generate(Sentence dataItem,
+	public ILexicon<LogicalExpression> generate(DI dataItem,
 			IModelImmutable<Sentence, LogicalExpression> model,
 			ICategoryServices<LogicalExpression> categoryServices) {
 		final List<String> tokens = dataItem.getTokens();
 		return lexicon;
 	}
 	
-	public static class Builder {
+	public static class Builder<DI extends Sentence> {
 		private static final String								CONST_SEED_NAME	= "absconst";
 		
 		protected final Set<LogicalConstant>					constants		= new HashSet<LogicalConstant>();
 							type);
 		}
 		
-		public Builder addConstants(Iterable<LogicalConstant> constantCollection) {
+		public Builder<DI> addConstants(
+				Iterable<LogicalConstant> constantCollection) {
 			for (final LogicalConstant constant : constantCollection) {
 				constants.add(constant);
 			}
 			return this;
 		}
 		
-		public Builder addTemplate(LexicalTemplate template) {
+		public Builder<DI> addTemplate(LexicalTemplate template) {
 			templates.add(template);
 			return this;
 		}
 		
-		public Builder addTemplates(Iterable<LexicalTemplate> templateCollection) {
+		public Builder<DI> addTemplates(
+				Iterable<LexicalTemplate> templateCollection) {
 			for (final LexicalTemplate template : templateCollection) {
 				addTemplate(template);
 			}
 			return this;
 		}
 		
-		public Builder addTemplatesFromModel(
+		public Builder<DI> addTemplatesFromModel(
 				IModelImmutable<?, LogicalExpression> sourceModel) {
 			final Collection<LexicalEntry<LogicalExpression>> lexicalEntries = sourceModel
 					.getLexicon().toCollection();
 			return this;
 		}
 		
-		public TemplateCoarseGenlex build() {
-			return new TemplateCoarseGenlex(templates, createPotentialLists(),
-					createAbstractLists(), maxTokens, parser, parsingBeam);
+		public TemplateCoarseGenlex<DI> build() {
+			return new TemplateCoarseGenlex<DI>(templates,
+					createPotentialLists(), createAbstractLists(), maxTokens,
+					parser, parsingBeam);
 		}
 		
 		protected Set<List<LogicalConstant>> createAbstractLists() {

genlex.ccg.template/src/edu/uw/cs/lil/tiny/genlex/ccg/template/resources/TemplateGenlexCreator.java

 package edu.uw.cs.lil.tiny.genlex.ccg.template.resources;
 
 import edu.uw.cs.lil.tiny.ccg.lexicon.ILexicon;
+import edu.uw.cs.lil.tiny.data.sentence.Sentence;
 import edu.uw.cs.lil.tiny.explat.IResourceRepository;
 import edu.uw.cs.lil.tiny.explat.ParameterizedExperiment.Parameters;
 import edu.uw.cs.lil.tiny.explat.resources.IResourceObjectCreator;
 import edu.uw.cs.lil.tiny.mr.lambda.LogicalExpression;
 import edu.uw.cs.lil.tiny.parser.ccg.model.IModelImmutable;
 
-public class TemplateGenlexCreator implements
-		IResourceObjectCreator<TemplateGenlex> {
+public class TemplateGenlexCreator<DI extends Sentence> implements
+		IResourceObjectCreator<TemplateGenlex<DI>> {
 	
 	private final String	type;
 	
 	
 	@SuppressWarnings("unchecked")
 	@Override
-	public TemplateGenlex create(Parameters params, IResourceRepository repo) {
-		final TemplateGenlex.Builder builder = new TemplateGenlex.Builder(
+	public TemplateGenlex<DI> create(Parameters params, IResourceRepository repo) {
+		final TemplateGenlex.Builder<DI> builder = new TemplateGenlex.Builder<DI>(
 				Integer.valueOf(params.get("maxTokens")));
 		
 		if (params.contains("templatesModel")) {

genlex.ccg.template/src/edu/uw/cs/lil/tiny/genlex/ccg/template/resources/TemplateSupervisedGenlexCreator.java

 package edu.uw.cs.lil.tiny.genlex.ccg.template.resources;
 
 import edu.uw.cs.lil.tiny.ccg.lexicon.ILexicon;
+import edu.uw.cs.lil.tiny.data.singlesentence.SingleSentence;
 import edu.uw.cs.lil.tiny.explat.IResourceRepository;
 import edu.uw.cs.lil.tiny.explat.ParameterizedExperiment.Parameters;
 import edu.uw.cs.lil.tiny.explat.resources.IResourceObjectCreator;
 import edu.uw.cs.lil.tiny.mr.lambda.LogicalExpression;
 import edu.uw.cs.lil.tiny.parser.ccg.model.IModelImmutable;
 
-public class TemplateSupervisedGenlexCreator implements
-		IResourceObjectCreator<TemplateSupervisedGenlex> {
+public class TemplateSupervisedGenlexCreator<DI extends SingleSentence>
+		implements IResourceObjectCreator<TemplateSupervisedGenlex<DI>> {
 	
 	private final String	type;
 	
 	
 	@SuppressWarnings("unchecked")
 	@Override
-	public TemplateSupervisedGenlex create(Parameters params,
+	public TemplateSupervisedGenlex<DI> create(Parameters params,
 			IResourceRepository repo) {
-		final TemplateSupervisedGenlex.Builder builder = new TemplateSupervisedGenlex.Builder(
+		final TemplateSupervisedGenlex.Builder<DI> builder = new TemplateSupervisedGenlex.Builder<DI>(
 				Integer.valueOf(params.get("maxTokens")));
 		
 		if (params.contains("templatesModel")) {

genlex.ccg.unification/.classpath

 	<classpathentry kind="src" path="/ccg.lexicon"/>
 	<classpathentry kind="src" path="/parser.ccg.cky"/>
 	<classpathentry kind="src" path="/ccg.lexicon.factored.lambda"/>
+	<classpathentry kind="src" path="/data.singlesentence"/>
 	<classpathentry kind="output" path="bin"/>
 </classpath>

genlex.ccg.unification/src/edu/uw/cs/lil/tiny/genlex/ccg/unification/UnificationGenlex.java

 import edu.uw.cs.lil.tiny.data.ILabeledDataItem;
 import edu.uw.cs.lil.tiny.data.ILossDataItem;
 import edu.uw.cs.lil.tiny.data.sentence.Sentence;
+import edu.uw.cs.lil.tiny.data.singlesentence.SingleSentence;
 import edu.uw.cs.lil.tiny.genlex.ccg.ILexiconGenerator;
 import edu.uw.cs.lil.tiny.genlex.ccg.unification.split.IUnificationSplitter;
 import edu.uw.cs.lil.tiny.genlex.ccg.unification.split.SplittingServices.SplittingPair;
  * @author Yoav Artzi
  * @author Luke Zettlemoyer
  */
-public class UnificationGenlex
+public class UnificationGenlex<DI extends SingleSentence>
 		implements
-		ILexiconGenerator<ILabeledDataItem<Sentence, LogicalExpression>, LogicalExpression, IModelImmutable<Sentence, LogicalExpression>> {
+		ILexiconGenerator<DI, LogicalExpression, IModelImmutable<Sentence, LogicalExpression>> {
 	
-	public static final String							SPLITTING_LEXICAL_ORIGIN	= "splitting";
-	private static final ILogger						LOG							= LoggerFactory
+	public static final ILogger							LOG							= LoggerFactory
 																							.create(UnificationGenlex.class);
-	
+	public static final String							SPLITTING_LEXICAL_ORIGIN	= "splitting";
 	private final boolean								conservative;
 	
 	private final AbstractCKYParser<LogicalExpression>	parser;
 	}
 	
 	@Override
-	public ILexicon<LogicalExpression> generate(
-			final ILabeledDataItem<Sentence, LogicalExpression> dataItem,
+	public ILexicon<LogicalExpression> generate(final DI dataItem,
 			IModelImmutable<Sentence, LogicalExpression> model,
 			ICategoryServices<LogicalExpression> categoryServices) {
 		
 		// Parse the sentence with pruner
 		final CKYParserOutput<LogicalExpression> parserOutput = parser.parse(
-				dataItem, createPruningFilter(dataItem),
+				dataItem.getSample(), createPruningFilter(dataItem),
 				model.createDataItemModel(dataItem.getSample()));
 		LOG.info("Lexical generation parsing time %f",
 				parserOutput.getParsingTime() / 1000.0);
 		}
 		
 	}
-	
 }

genlex.ccg.unification/src/edu/uw/cs/lil/tiny/genlex/ccg/unification/resources/UnificationGenlexCreator.java

  ******************************************************************************/
 package edu.uw.cs.lil.tiny.genlex.ccg.unification.resources;
 
+import edu.uw.cs.lil.tiny.data.singlesentence.SingleSentence;
 import edu.uw.cs.lil.tiny.explat.IResourceRepository;
 import edu.uw.cs.lil.tiny.explat.ParameterizedExperiment;
 import edu.uw.cs.lil.tiny.explat.ParameterizedExperiment.Parameters;
 import edu.uw.cs.lil.tiny.mr.lambda.LogicalExpression;
 import edu.uw.cs.lil.tiny.parser.ccg.cky.AbstractCKYParser;
 
-public class UnificationGenlexCreator implements
-		IResourceObjectCreator<UnificationGenlex> {
+public class UnificationGenlexCreator<DI extends SingleSentence> implements
+		IResourceObjectCreator<UnificationGenlex<DI>> {
 	
 	private final String	type;
 	
 	
 	@SuppressWarnings("unchecked")
 	@Override
-	public UnificationGenlex create(Parameters params, IResourceRepository repo) {
-		return new UnificationGenlex(
+	public UnificationGenlex<DI> create(Parameters params,
+			IResourceRepository repo) {
+		return new UnificationGenlex<DI>(
 				(AbstractCKYParser<LogicalExpression>) repo
 						.getResource(ParameterizedExperiment.PARSER_RESOURCE),
 				(IUnificationSplitter) repo.getResource(params.get("splitter")),

genlex.ccg.unification/src/edu/uw/cs/lil/tiny/genlex/ccg/unification/split/MakeApplicationSplits.java

  */
 public class MakeApplicationSplits {
 	
-	private static final ILogger	LOG	= LoggerFactory
+	public static final ILogger	LOG	= LoggerFactory
 												.create(MakeApplicationSplits.class);
 	
 	private MakeApplicationSplits() {

genlex.ccg.unification/src/edu/uw/cs/lil/tiny/genlex/ccg/unification/split/MakeCompositionSplits.java

  */
 public class MakeCompositionSplits implements ILogicalExpressionVisitor {
 	
-	private static final ILogger						LOG		= LoggerFactory
+	public static final ILogger						LOG		= LoggerFactory
 																		.create(MakeCompositionSplits.class
 																				.getName());
 	

genlex.ccg/src/edu/uw/cs/lil/tiny/genlex/ccg/ILexiconGenerator.java

  * Lexical entries generator.
  * 
  * @author Yoav Artzi
- * @param <SAMPLE>
- *            Type of sample data item.
+ * @param <DI>
+ *            Type of sample data item for generation.
  * @param <MR>
  *            Type of meaning representation.
+ * @param <MODEL>
+ *            Inference model.
  */
-public interface ILexiconGenerator<SAMPLE extends IDataItem<?>, MR, MODEL extends IModelImmutable<?, ?>> {
+public interface ILexiconGenerator<DI extends IDataItem<?>, MR, MODEL extends IModelImmutable<?, ?>> {
 	public static final String	GENLEX_LEXICAL_ORIGIN	= "genlex";
 	
-	ILexicon<MR> generate(SAMPLE dataItem, MODEL model,
+	ILexicon<MR> generate(DI dataItem, MODEL model,
 			ICategoryServices<MR> categoryServices);
 }

geoquery/src/edu/uw/cs/lil/tiny/geoquery/GeoExperiment.java

 import edu.uw.cs.lil.tiny.ccg.lexicon.factored.lambda.FactoredLexicon;
 import edu.uw.cs.lil.tiny.ccg.lexicon.factored.lambda.FactoredLexicon.FactoredLexicalEntry;
 import edu.uw.cs.lil.tiny.ccg.lexicon.factored.lambda.FactoredLexiconServices;
-import edu.uw.cs.lil.tiny.data.IDataItem;
 import edu.uw.cs.lil.tiny.data.sentence.Sentence;
 import edu.uw.cs.lil.tiny.data.singlesentence.SingleSentence;
 import edu.uw.cs.lil.tiny.explat.DistributedExperiment;
 import edu.uw.cs.utils.log.LoggerFactory;
 
 public class GeoExperiment extends DistributedExperiment {
-	private static final ILogger					LOG	= LoggerFactory
+	public static final ILogger						LOG	= LoggerFactory
 																.create(GeoExperiment.class);
 	
 	private final LogicalExpressionCategoryServices	categoryServices;
 				.get("tester"));
 		
 		// The model to use
-		final Model<IDataItem<Sentence>, LogicalExpression> model = getResource(params
+		final Model<Sentence, LogicalExpression> model = getResource(params
 				.get("model"));
 		
 		// Create and return the job
 	@SuppressWarnings("unchecked")
 	private Job createTrainJob(Parameters params) throws FileNotFoundException {
 		// The model to use
-		final Model<IDataItem<Sentence>, LogicalExpression> model = (Model<IDataItem<Sentence>, LogicalExpression>) getResource(params
+		final Model<Sentence, LogicalExpression> model = (Model<Sentence, LogicalExpression>) getResource(params
 				.get("model"));
 		
 		// The learning
-		final ILearner<Sentence, SingleSentence, LogicalExpression, Model<IDataItem<Sentence>, LogicalExpression>> learner = (ILearner<Sentence, SingleSentence, LogicalExpression, Model<IDataItem<Sentence>, LogicalExpression>>) getResource(params
+		final ILearner<Sentence, SingleSentence, Model<Sentence, LogicalExpression>> learner = (ILearner<Sentence, SingleSentence, Model<Sentence, LogicalExpression>>) getResource(params
 				.get("learner"));
 		
 		return new Job(params.get("id"), new HashSet<String>(

geoquery/src/edu/uw/cs/lil/tiny/geoquery/GeoMain.java

  * @author Yoav Artzi
  */
 public class GeoMain {
-	private static final ILogger	LOG	= LoggerFactory.create(GeoMain.class);
+	public static final ILogger	LOG	= LoggerFactory.create(GeoMain.class);
 	
 	public static void main(String[] args) {
 		if (args.length < 1) {

geoquery/src/edu/uw/cs/lil/tiny/geoquery/GeoResourceRepo.java

 		registerResourceCreator(new LogicalExpressionCoordinationFeatureSetCreator<Sentence>());
 		registerResourceCreator(new FactoredLexiconCreator());
 		registerResourceCreator(new SingleSentenceDatasetCreator());
-		registerResourceCreator(new TemplateSupervisedGenlexCreator());
+		registerResourceCreator(new TemplateSupervisedGenlexCreator<SingleSentence>());
 		registerResourceCreator(new SingleSentenceDatasetCreator());
 		registerResourceCreator(new ValidationPerceptronCreator<Sentence, SingleSentence, LogicalExpression>());
 		registerResourceCreator(new ValidationStocGradCreator<Sentence, SingleSentence, LogicalExpression>());
 		registerResourceCreator(new LabeledValidatorCreator<SingleSentence, LogicalExpression>());
 		registerResourceCreator(new TesterCreator<Sentence, LogicalExpression>());
 		registerResourceCreator(new LexiconModelInitCreator<Sentence, LogicalExpression>());
-		registerResourceCreator(new UnificationGenlexCreator());
+		registerResourceCreator(new UnificationGenlexCreator<SingleSentence>());
 		registerResourceCreator(new SplitterCreator());
 		registerResourceCreator(new UnificationModelInitCreator());
 		registerResourceCreator(new LexemeCooccurrenceScorerCreator());

learn.simple/.classpath

 	<classpathentry kind="src" path="/tinyutils"/>
 	<classpathentry combineaccessrules="false" kind="src" path="/parser.ccg.joint"/>
 	<classpathentry combineaccessrules="false" kind="src" path="/ccg.lexicon"/>
+	<classpathentry kind="src" path="/data.singlesentence"/>
+	<classpathentry kind="src" path="/mr.lambda"/>
 	<classpathentry kind="output" path="bin"/>
 </classpath>

learn.simple/src/edu/uw/cs/lil/tiny/learn/simple/SimplePerceptron.java

 import java.util.LinkedList;
 import java.util.List;
 
-import edu.uw.cs.lil.tiny.data.IDataItem;
-import edu.uw.cs.lil.tiny.data.ILabeledDataItem;
 import edu.uw.cs.lil.tiny.data.collection.IDataCollection;
+import edu.uw.cs.lil.tiny.data.sentence.Sentence;
+import edu.uw.cs.lil.tiny.data.singlesentence.SingleSentence;
 import edu.uw.cs.lil.tiny.learn.ILearner;
+import edu.uw.cs.lil.tiny.mr.lambda.LogicalExpression;
 import edu.uw.cs.lil.tiny.parser.IParse;
 import edu.uw.cs.lil.tiny.parser.IParser;
 import edu.uw.cs.lil.tiny.parser.IParserOutput;
  * @param <DI>
  * @param <MR>
  */
-public class SimplePerceptron<DI extends ILabeledDataItem<LANG, MR>, LANG, MR>
-		implements ILearner<LANG, DI, MR, Model<IDataItem<LANG>, MR>> {
-	private static final ILogger		LOG	= LoggerFactory
-													.create(SimplePerceptron.class);
+public class SimplePerceptron implements
+		ILearner<Sentence, SingleSentence, Model<Sentence, LogicalExpression>> {
+	public static final ILogger							LOG	= LoggerFactory
+																	.create(SimplePerceptron.class);
 	
-	private final int					numIterations;
-	private final IParser<LANG, MR>		parser;
-	private final IDataCollection<DI>	trainingData;
+	private final int									numIterations;
+	private final IParser<Sentence, LogicalExpression>	parser;
+	private final IDataCollection<SingleSentence>		trainingData;
 	
 	public SimplePerceptron(int numIterations,
-			IDataCollection<DI> trainingData, IParser<LANG, MR> parser) {
+			IDataCollection<SingleSentence> trainingData,
+			IParser<Sentence, LogicalExpression> parser) {
 		this.numIterations = numIterations;
 		this.trainingData = trainingData;
 		this.parser = parser;
 	}
 	
 	@Override
-	public void train(Model<IDataItem<LANG>, MR> model) {
+	public void train(Model<Sentence, LogicalExpression> model) {
 		for (int iterationNumber = 0; iterationNumber < numIterations; ++iterationNumber) {
 			// Training iteration, go over all training samples
 			LOG.info("=========================");
 			LOG.info("=========================");
 			int itemCounter = -1;
 			
-			for (final DI dataItem : trainingData) {
+			for (final SingleSentence dataItem : trainingData) {
 				final long startTime = System.currentTimeMillis();
 				
 				LOG.info("%d : ================== [%d]", ++itemCounter,
 				LOG.info("Sample type: %s", dataItem.getClass().getSimpleName());
 				LOG.info("%s", dataItem);
 				
-				final IDataItemModel<MR> dataItemModel = model
-						.createDataItemModel(dataItem);
-				final IParserOutput<MR> parserOutput = parser.parse(dataItem,
-						dataItemModel);
-				final List<? extends IParse<MR>> bestParses = parserOutput
+				final IDataItemModel<LogicalExpression> dataItemModel = model
+						.createDataItemModel(dataItem.getSample());
+				final IParserOutput<LogicalExpression> parserOutput = parser
+						.parse(dataItem.getSample(), dataItemModel);
+				final List<? extends IParse<LogicalExpression>> bestParses = parserOutput
 						.getBestParses();
 				
 				// Correct parse
-				final List<? extends IParse<MR>> correctParses = parserOutput
-						.getMaxParses(new IFilter<MR>() {
+				final List<? extends IParse<LogicalExpression>> correctParses = parserOutput
+						.getMaxParses(new IFilter<LogicalExpression>() {
 							
 							@Override
-							public boolean isValid(MR e) {
+							public boolean isValid(LogicalExpression e) {
 								return dataItem.getLabel().equals(e);
 							}
 						});
 				
 				// Violating parses
-				final List<IParse<MR>> violatingBadParses = new LinkedList<IParse<MR>>();
-				for (final IParse<MR> parse : bestParses) {
+				final List<IParse<LogicalExpression>> violatingBadParses = new LinkedList<IParse<LogicalExpression>>();
+				for (final IParse<LogicalExpression> parse : bestParses) {
 					if (!dataItem.isCorrect(parse.getSemantics())) {
 						violatingBadParses.add(parse);
 						LOG.info("Bad parse: %s", parse.getSemantics());
 					final IHashVector update = HashVectorFactory.create();
 					
 					// Positive update
-					for (final IParse<MR> parse : correctParses) {
+					for (final IParse<LogicalExpression> parse : correctParses) {
 						parse.getAverageMaxFeatureVector().addTimesInto(
 								(1.0 / correctParses.size()), update);
 					}
 					
 					// Negative update
-					for (final IParse<MR> parse : violatingBadParses) {
+					for (final IParse<LogicalExpression> parse : violatingBadParses) {
 						parse.getAverageMaxFeatureVector().addTimesInto(
 								-1.0 * (1.0 / violatingBadParses.size()),
 								update);

learn.situated/.classpath

 	<classpathentry kind="src" path="/explat"/>
 	<classpathentry combineaccessrules="false" kind="src" path="/ccg.lexicon"/>
 	<classpathentry kind="src" path="/genlex.ccg"/>
+	<classpathentry kind="src" path="/data.situated"/>
 	<classpathentry kind="output" path="bin"/>
 </classpath>

learn.situated/src/edu/uw/cs/lil/tiny/learn/situated/AbstractSituatedLearner.java

 import edu.uw.cs.lil.tiny.ccg.lexicon.ILexicon;
 import edu.uw.cs.lil.tiny.ccg.lexicon.LexicalEntry;
 import edu.uw.cs.lil.tiny.ccg.lexicon.LexicalEntry.Origin;
-import edu.uw.cs.lil.tiny.data.IDataItem;
+import edu.uw.cs.lil.tiny.data.ILabeledDataItem;
 import edu.uw.cs.lil.tiny.data.collection.IDataCollection;
-import edu.uw.cs.lil.tiny.data.sentence.Sentence;
+import edu.uw.cs.lil.tiny.data.situated.sentence.SituatedSentence;
 import edu.uw.cs.lil.tiny.genlex.ccg.ILexiconGenerator;
 import edu.uw.cs.lil.tiny.learn.ILearner;
 import edu.uw.cs.lil.tiny.learn.OnlineLearningStats;
  *            Type of execution step.
  * @param <ERESULT>
  *            Type of execution result.
+ * @param <DI>
+ *            Data item used for learning.
  */
-public abstract class AbstractSituatedLearner<STATE, MR, ESTEP, ERESULT, DI extends IDataItem<Pair<Sentence, STATE>>>
+public abstract class AbstractSituatedLearner<STATE, MR, ESTEP, ERESULT, DI extends ILabeledDataItem<SituatedSentence<STATE>, ?>>
 		implements
-		ILearner<Pair<Sentence, STATE>, DI, MR, JointModel<IDataItem<Pair<Sentence, STATE>>, STATE, MR, ESTEP>> {
-	private static final ILogger																						LOG	= LoggerFactory
-																																	.create(AbstractSituatedLearner.class);
-	private final ICategoryServices<MR>																					categoryServices;
+		ILearner<SituatedSentence<STATE>, DI, JointModel<SituatedSentence<STATE>, MR, ESTEP>> {
+	public static final ILogger																			LOG	= LoggerFactory
+																													.create(AbstractSituatedLearner.class);
+	private final ICategoryServices<MR>																	categoryServices;
 	
 	/**
 	 * Number of training epochs.
 	 */
-	private final int																									epochs;
+	private final int																					epochs;
 	
 	/**
 	 * GENLEX procedure. If 'null' skip lexical induction.
 	 */
-	private final ILexiconGenerator<DI, MR, IJointModelImmutable<IDataItem<Pair<Sentence, STATE>>, STATE, MR, ESTEP>>	genlex;
+	private final ILexiconGenerator<DI, MR, IJointModelImmutable<SituatedSentence<STATE>, MR, ESTEP>>	genlex;
 	
 	/**
 	 * Parser beam size for lexical generation.
 	 */
-	private final int																									lexiconGenerationBeamSize;
+	private final int																					lexiconGenerationBeamSize;
 	
 	/**
 	 * Max sentence length to process. If longer, skip.
 	 */
-	private final int																									maxSentenceLength;
+	private final int																					maxSentenceLength;
 	
 	/**
 	 * Training data.
 	 */
-	private final IDataCollection<DI>																					trainingData;
+	private final IDataCollection<DI>																	trainingData;
 	
 	/**
 	 * Mapping of training data samples to their gold labels.
 	 */
-	private final Map<DI, Pair<MR, ERESULT>>																			trainingDataDebug;
+	private final Map<DI, Pair<MR, ERESULT>>															trainingDataDebug;
 	
 	/**
 	 * Joint parser for inference.
 	 */
-	protected final IJointParser<Sentence, STATE, MR, ESTEP, ERESULT>													parser;
+	protected final IJointParser<SituatedSentence<STATE>, MR, ESTEP, ERESULT>							parser;
 	/**
 	 * Parser output logger.
 	 */
-	protected final IJointOutputLogger<MR, ESTEP, ERESULT>																parserOutputLogger;
+	protected final IJointOutputLogger<MR, ESTEP, ERESULT>												parserOutputLogger;
 	/**
 	 * Learning statistics.
 	 */
-	protected final OnlineLearningStats																					stats;
+	protected final OnlineLearningStats																	stats;
 	
 	protected AbstractSituatedLearner(
 			int numIterations,
 			Map<DI, Pair<MR, ERESULT>> trainingDataDebug,
 			int maxSentenceLength,
 			int lexiconGenerationBeamSize,
-			IJointParser<Sentence, STATE, MR, ESTEP, ERESULT> parser,
+			IJointParser<SituatedSentence<STATE>, MR, ESTEP, ERESULT> parser,
 			IJointOutputLogger<MR, ESTEP, ERESULT> parserOutputLogger,
 			ICategoryServices<MR> categoryServices,
-			ILexiconGenerator<DI, MR, IJointModelImmutable<IDataItem<Pair<Sentence, STATE>>, STATE, MR, ESTEP>> genlex) {
+			ILexiconGenerator<DI, MR, IJointModelImmutable<SituatedSentence<STATE>, MR, ESTEP>> genlex) {
 		this.epochs = numIterations;
 		this.trainingData = trainingData;
 		this.trainingDataDebug = trainingDataDebug;
 	}
 	
 	@Override
-	public void train(
-			JointModel<IDataItem<Pair<Sentence, STATE>>, STATE, MR, ESTEP> model) {
+	public void train(JointModel<SituatedSentence<STATE>, MR, ESTEP> model) {
 		// Epochs
 		for (int epochNumber = 0; epochNumber < epochs; ++epochNumber) {
 			// Training epoch, iterate over all training samples
 				LOG.info("%s", dataItem);
 				
 				// Skip sample, if over the length limit
-				if (dataItem.getSample().first().getTokens().size() > maxSentenceLength) {
+				if (dataItem.getSample().getTokens().size() > maxSentenceLength) {
 					LOG.warn("Training sample too long, skipping");
 					continue;
 				}
 				
 				// Sample data item model
 				final IJointDataItemModel<MR, ESTEP> dataItemModel = model
-						.createJointDataItemModel(dataItem);
+						.createJointDataItemModel(dataItem.getSample());
 				
 				// ///////////////////////////
 				// Step I: Generate a large number of potential lexical entries,
 		}
 	}
 	
-	private void lexicalInduction(
-			final DI dataItem,
+	private void lexicalInduction(final DI dataItem,
 			IJointDataItemModel<MR, ESTEP> dataItemModel,
-			JointModel<IDataItem<Pair<Sentence, STATE>>, STATE, MR, ESTEP> model,
+			JointModel<SituatedSentence<STATE>, MR, ESTEP> model,
 			int dataItemNumber, int epochNumber) {
 		// Generate lexical entries
 		final ILexicon<MR> generatedLexicon = genlex.generate(dataItem, model,
 			
 			// Parse with generated lexicon
 			final IJointOutput<MR, ERESULT> generateLexiconParserOutput = parser
-					.parse(dataItem, dataItemModel, false, generatedLexicon,
-							lexiconGenerationBeamSize);
+					.parse(dataItem.getSample(), dataItemModel, false,
+							generatedLexicon, lexiconGenerationBeamSize);
 			
 			// Log lexical generation parsing time
 			final long genTime = System.currentTimeMillis() - genStartTime;
 	/**
 	 * Parameter update method.
 	 */
-	protected abstract void parameterUpdate(
-			DI dataItem,
+	protected abstract void parameterUpdate(DI dataItem,
 			IJointDataItemModel<MR, ESTEP> dataItemModel,
-			JointModel<IDataItem<Pair<Sentence, STATE>>, STATE, MR, ESTEP> model,
+			JointModel<SituatedSentence<STATE>, MR, ESTEP> model,
 			int itemCounter, int epochNumber);
 	
 	abstract protected boolean validate(DI dataItem,

learn.situated/src/edu/uw/cs/lil/tiny/learn/situated/perceptron/SituatedValidationPerceptron.java

 import java.util.Map;
 
 import edu.uw.cs.lil.tiny.ccg.categories.ICategoryServices;
-import edu.uw.cs.lil.tiny.data.IDataItem;
+import edu.uw.cs.lil.tiny.data.ILabeledDataItem;
 import edu.uw.cs.lil.tiny.data.collection.IDataCollection;
-import edu.uw.cs.lil.tiny.data.sentence.Sentence;
+import edu.uw.cs.lil.tiny.data.situated.sentence.SituatedSentence;
 import edu.uw.cs.lil.tiny.data.utils.IValidator;
 import edu.uw.cs.lil.tiny.genlex.ccg.ILexiconGenerator;
 import edu.uw.cs.lil.tiny.learn.PerceptronServices;
  *            Type of execution step.
  * @param <ERESULT>
  *            Type of execution result.
+ * @param <DI>
+ *            Training data item.
  */
-public class SituatedValidationPerceptron<STATE, MR, ESTEP, ERESULT, DI extends IDataItem<Pair<Sentence, STATE>>>
+public class SituatedValidationPerceptron<STATE, MR, ESTEP, ERESULT, DI extends ILabeledDataItem<SituatedSentence<STATE>, ?>>
 		extends AbstractSituatedLearner<STATE, MR, ESTEP, ERESULT, DI> {
-	private static final ILogger					LOG	= LoggerFactory
+	public static final ILogger						LOG	= LoggerFactory
 																.create(SituatedValidationPerceptron.class);
 	private final boolean							hardUpdates;
 	private final double							margin;
 			Map<DI, Pair<MR, ERESULT>> trainingDataDebug,
 			int maxSentenceLength,
 			int lexiconGenerationBeamSize,
-			IJointParser<Sentence, STATE, MR, ESTEP, ERESULT> parser,
+			IJointParser<SituatedSentence<STATE>, MR, ESTEP, ERESULT> parser,
 			boolean hardUpdates,
 			IJointOutputLogger<MR, ESTEP, ERESULT> parserOutputLogger,
 			IValidator<DI, Pair<MR, ERESULT>> validator,
 			ICategoryServices<MR> categoryServices,
-			ILexiconGenerator<DI, MR, IJointModelImmutable<IDataItem<Pair<Sentence, STATE>>, STATE, MR, ESTEP>> genlex) {
+			ILexiconGenerator<DI, MR, IJointModelImmutable<SituatedSentence<STATE>, MR, ESTEP>> genlex) {
 		super(numIterations, trainingData, trainingDataDebug,
 				maxSentenceLength, lexiconGenerationBeamSize, parser,
 				parserOutputLogger, categoryServices, genlex);
 	}
 	
 	@Override
-	protected void parameterUpdate(
-			DI dataItem,
+	protected void parameterUpdate(DI dataItem,
 			IJointDataItemModel<MR, ESTEP> dataItemModel,
-			JointModel<IDataItem<Pair<Sentence, STATE>>, STATE, MR, ESTEP> model,
+			JointModel<SituatedSentence<STATE>, MR, ESTEP> model,
 			int itemCounter, int epochNumber) {
 		
 		// Parse with current model
-		final IJointOutput<MR, ERESULT> parserOutput = parser.parse(dataItem,
-				dataItemModel);
+		final IJointOutput<MR, ERESULT> parserOutput = parser.parse(
+				dataItem.getSample(), dataItemModel);
 		stats.recordModelParsing(parserOutput.getInferenceTime());
 		parserOutputLogger.log(parserOutput, dataItemModel);
 		final List<? extends IJointParse<MR, ERESULT>> modelParses = parserOutput
 	 * 
 	 * @author Yoav Artzi
 	 */
-	public static class Builder<STATE, MR, ESTEP, ERESULT, DI extends IDataItem<Pair<Sentence, STATE>>> {
+	public static class Builder<STATE, MR, ESTEP, ERESULT, DI extends ILabeledDataItem<SituatedSentence<STATE>, ?>> {
 		
 		/**
 		 * Required for lexical induction.
 		 */
-		private ICategoryServices<MR>																				categoryServices			= null;
+		private ICategoryServices<MR>																categoryServices			= null;
 		
 		/**
 		 * GENLEX procedure. If 'null' skip lexical induction.
 		 */
-		private ILexiconGenerator<DI, MR, IJointModelImmutable<IDataItem<Pair<Sentence, STATE>>, STATE, MR, ESTEP>>	genlex						= null;
+		private ILexiconGenerator<DI, MR, IJointModelImmutable<SituatedSentence<STATE>, MR, ESTEP>>	genlex						= null;
 		
 		/**
 		 * Use hard updates. Meaning: consider only highest-scored valid parses
 		 * for parameter updates, instead of all valid parses.
 		 */
-		private boolean																								hardUpdates					= false;
+		private boolean																				hardUpdates					= false;
 		
 		/**
 		 * Beam size to use when doing loss sensitive pruning with generated
 		 * lexicon.
 		 */
-		private int																									lexiconGenerationBeamSize	= 20;
+		private int																					lexiconGenerationBeamSize	= 20;
 		
 		/** Margin to scale the relative loss function */
-		private double																								margin						= 1.0;
+		private double																				margin						= 1.0;
 		
 		/**
 		 * Max sentence length. Sentence longer than this value will be skipped
 		 * during training
 		 */
-		private int																									maxSentenceLength			= Integer.MAX_VALUE;
+		private int																					maxSentenceLength			= Integer.MAX_VALUE;
 		/** Number of training iterations */
-		private int																									numIterations				= 4;
+		private int																					numIterations				= 4;
 		
-		private final IJointParser<Sentence, STATE, MR, ESTEP, ERESULT>												parser;
+		private final IJointParser<SituatedSentence<STATE>, MR, ESTEP, ERESULT>						parser;
 		
-		private IJointOutputLogger<MR, ESTEP, ERESULT>																parserOutputLogger			= new IJointOutputLogger<MR, ESTEP, ERESULT>() {
-																																					
-																																					public void log(
-																																							IJointOutput<MR, ERESULT> output,
-																																							IJointDataItemModel<MR, ESTEP> dataItemModel) {
-																																						// Stub
-																																						
-																																					}
-																																				};
+		private IJointOutputLogger<MR, ESTEP, ERESULT>												parserOutputLogger			= new IJointOutputLogger<MR, ESTEP, ERESULT>() {
+																																	
+																																	public void log(
+																																			IJointOutput<MR, ERESULT> output,
+																																			IJointDataItemModel<MR, ESTEP> dataItemModel) {
+																																		// Stub
+																																		
+																																	}
+																																};
 		
 		/** Training data */
-		private final IDataCollection<DI>																			trainingData;
+		private final IDataCollection<DI>															trainingData;
 		
 		/**
 		 * Mapping a subset of training samples into their gold label for debug.
 		 */
-		private Map<DI, Pair<MR, ERESULT>>																			trainingDataDebug			= new HashMap<DI, Pair<MR, ERESULT>>();
+		private Map<DI, Pair<MR, ERESULT>>															trainingDataDebug			= new HashMap<DI, Pair<MR, ERESULT>>();
 		
-		private final IValidator<DI, Pair<MR, ERESULT>>																validator;
+		private final IValidator<DI, Pair<MR, ERESULT>>												validator;
 		
-		public Builder(IDataCollection<DI> trainingData,
-				IJointParser<Sentence, STATE, MR, ESTEP, ERESULT> parser,
+		public Builder(
+				IDataCollection<DI> trainingData,
+				IJointParser<SituatedSentence<STATE>, MR, ESTEP, ERESULT> parser,
 				IValidator<DI, Pair<MR, ERESULT>> validator) {
 			this.trainingData = trainingData;
 			this.parser = parser;
 		}
 		
 		public Builder<STATE, MR, ESTEP, ERESULT, DI> setGenlex(
-				ILexiconGenerator<DI, MR, IJointModelImmutable<IDataItem<Pair<Sentence, STATE>>, STATE, MR, ESTEP>> genlex,
+				ILexiconGenerator<DI, MR, IJointModelImmutable<SituatedSentence<STATE>, MR, ESTEP>> genlex,
 				ICategoryServices<MR> categoryServices) {
 			this.genlex = genlex;
 			this.categoryServices = categoryServices;

learn.situated/src/edu/uw/cs/lil/tiny/learn/situated/resources/SituatedValidationPerceptronCreator.java

 package edu.uw.cs.lil.tiny.learn.situated.resources;
 
 import edu.uw.cs.lil.tiny.ccg.categories.ICategoryServices;
-import edu.uw.cs.lil.tiny.data.IDataItem;
+import edu.uw.cs.lil.tiny.data.ILabeledDataItem;
 import edu.uw.cs.lil.tiny.data.collection.IDataCollection;
-import edu.uw.cs.lil.tiny.data.sentence.Sentence;
+import edu.uw.cs.lil.tiny.data.situated.sentence.SituatedSentence;
 import edu.uw.cs.lil.tiny.data.utils.IValidator;
 import edu.uw.cs.lil.tiny.explat.IResourceRepository;
 import edu.uw.cs.lil.tiny.explat.ParameterizedExperiment;
 import edu.uw.cs.lil.tiny.parser.joint.model.IJointModelImmutable;
 import edu.uw.cs.utils.composites.Pair;
 
-public class SituatedValidationPerceptronCreator<STATE, MR, ESTEP, ERESULT, DI extends IDataItem<Pair<Sentence, STATE>>>
+public class SituatedValidationPerceptronCreator<STATE, MR, ESTEP, ERESULT, DI extends ILabeledDataItem<SituatedSentence<STATE>, ?>>
 		implements
 		IResourceObjectCreator<SituatedValidationPerceptron<STATE, MR, ESTEP, ERESULT, DI>> {
 	
 		
 		final Builder<STATE, MR, ESTEP, ERESULT, DI> builder = new SituatedValidationPerceptron.Builder<STATE, MR, ESTEP, ERESULT, DI>(
 				trainingData,
-				(IJointParser<Sentence, STATE, MR, ESTEP, ERESULT>) repo
+				(IJointParser<SituatedSentence<STATE>, MR, ESTEP, ERESULT>) repo
 						.getResource(ParameterizedExperiment.PARSER_RESOURCE),
 				(IValidator<DI, Pair<MR, ERESULT>>) repo.getResource(params
 						.get("validator")));
 		
 		if (params.contains("genlex")) {
 			builder.setGenlex(
-					(ILexiconGenerator<DI, MR, IJointModelImmutable<IDataItem<Pair<Sentence, STATE>>, STATE, MR, ESTEP>>) repo
+					(ILexiconGenerator<DI, MR, IJointModelImmutable<SituatedSentence<STATE>, MR, ESTEP>>) repo
 							.getResource(params.get("genlex")),
 					(ICategoryServices<MR>) repo
 							.getResource(ParameterizedExperiment.CATEGORY_SERVICES_RESOURCE));

learn.situated/src/edu/uw/cs/lil/tiny/learn/situated/resources/SituatedValidationStocGradCreator.java

 package edu.uw.cs.lil.tiny.learn.situated.resources;
 
 import edu.uw.cs.lil.tiny.ccg.categories.ICategoryServices;
-import edu.uw.cs.lil.tiny.data.IDataItem;
+import edu.uw.cs.lil.tiny.data.ILabeledDataItem;
 import edu.uw.cs.lil.tiny.data.collection.IDataCollection;
-import edu.uw.cs.lil.tiny.data.sentence.Sentence;
+import edu.uw.cs.lil.tiny.data.situated.sentence.SituatedSentence;
 import edu.uw.cs.lil.tiny.data.utils.IValidator;
 import edu.uw.cs.lil.tiny.explat.IResourceRepository;
 import edu.uw.cs.lil.tiny.explat.ParameterizedExperiment;
 import edu.uw.cs.lil.tiny.parser.joint.IJointOutputLogger;
 import edu.uw.cs.lil.tiny.parser.joint.graph.IJointGraphParser;
 import edu.uw.cs.lil.tiny.parser.joint.model.IJointModelImmutable;
-import edu.uw.cs.utils.composites.Pair;
 
-public class SituatedValidationStocGradCreator<STATE, MR, ESTEP, ERESULT, DI extends IDataItem<Pair<Sentence, STATE>>>
+public class SituatedValidationStocGradCreator<STATE, MR, ESTEP, ERESULT, DI extends ILabeledDataItem<SituatedSentence<STATE>, ?>>
 		implements
 		IResourceObjectCreator<SituatedValidationStocGrad<STATE, MR, ESTEP, ERESULT, DI>> {
 	
 		
 		final Builder<STATE, MR, ESTEP, ERESULT, DI> builder = new SituatedValidationStocGrad.Builder<STATE, MR, ESTEP, ERESULT, DI>(
 				trainingData,
-				(IJointGraphParser<Sentence, STATE, MR, ESTEP, ERESULT>) repo
+				(IJointGraphParser<SituatedSentence<STATE>, MR, ESTEP, ERESULT>) repo
 						.getResource(ParameterizedExperiment.PARSER_RESOURCE),
 				(IValidator<DI, ERESULT>) repo.getResource(params
 						.get("validator")));
 		
 		if (params.contains("genlex")) {
 			builder.setGenlex(
-					(ILexiconGenerator<DI, MR, IJointModelImmutable<IDataItem<Pair<Sentence, STATE>>, STATE, MR, ESTEP>>) repo
+					(ILexiconGenerator<DI, MR, IJointModelImmutable<SituatedSentence<STATE>, MR, ESTEP>>) repo
 							.getResource(params.get("genlex")),
 					(ICategoryServices<MR>) repo
 							.getResource(ParameterizedExperiment.CATEGORY_SERVICES_RESOURCE));

learn.situated/src/edu/uw/cs/lil/tiny/learn/situated/stocgrad/SituatedValidationStocGrad.java

 import java.util.Map;
 
 import edu.uw.cs.lil.tiny.ccg.categories.ICategoryServices;