# SemiNER - Named Entity Labeler

SemiNER is a tool for Named Entity labeleling. This release includes:

• two models trained on the German CoNLL data with features extracted from a large unlabeled German corpus
• a model trained on the BBN corpus.

## Usage

The easiest way to start using SemiNER is to run the following command from the top-level sequor directory:

cabal install --prefix=pwd


There are two pretrained German models: full (which uses all the features from training data, including lemmas, POS tags and chunk tags) and raw (which only uses word features and cluster id features). You don't need to run any additional preprocessing steps to run the raw model.

There is also a single English model, which also does not need any additional preprocessing.

Run these commands from the toplevel sequor directory. To label German text using the raw pre-trained model:

bin/seminer de-raw < INPUT-FILE > OUTPUT-FILE


To label German text using the full pre-trained model:

bin/seminer de-full < INPUT-FILE > OUTPUT-FILE


To label English text:

bin/seminer en < INPUT-FILE > OUTPUT-FILE


## Format

The CoNLL input format is one token per line, sentences separated by a blank line.

For prediction with the German raw model you just need the word forms:

Seit
1740
wurde
im
Steinheimer
Apfelwein
ausgeschenkt
.


For the German full model you need to provide word-form, lemma, POS and the chunk label:

Seit seit APPR B-PC
1740 @card@ CARD B-NC
wurde werden VAFIN B-VC
im im APPRART B-PC
Steinheimer <unknown> NN B-NC