HTTPS SSH

This parser is an adaptation of the Berkeley parser 1.7 (http://nlp.cs.berkeley.edu/software.shtml). If you use this code, please cite:

@InProceedings{vandergoot-vannoord:2017:Short,
  author    = {van der Goot, Rob  and  van Noord, Gertjan},
  title     = {Parser Adaptation for Social Media by Integrating Normalization},
  booktitle = {Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)},
  month     = {July},
  year      = {2017},
  address   = {Vancouver, Canada},
  publisher = {Association for Computational Linguistics},
  pages     = {491--497},
}
@InProceedings{petrov-klein:2007:main,
  author    = {Petrov, Slav  and  Klein, Dan},
  title     = {Improved Inference for Unlexicalized Parsing},
  booktitle = {Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference},
  month     = {April},
  year      = {2007},
  address   = {Rochester, New York},
  publisher = {Association for Computational Linguistics},
  pages     = {404--411},
  url       = {http://www.aclweb.org/anthology/N/N07/N07-1051}
}

This parser can parse weighted word-graphs. They should look like this:

0 new 1 0.931861
0 new 1 0.041028
0 new 1 0.027111
1 pix 2 0.994940
1 pic 2 0.002540
1 photos 2 0.002520
2 comming 3 0.782531
2 coming 3 0.210903
2 coming 3 0.006566
3 tomorroe 4 0.904690
3 tomorrow 4 0.079266
3 tomorrow 4 0.016044
.

This makes it work together with MoNoise (https://bitbucket.org/robvanderg/monoise); download MoNoise if your goal is to parse Tweets.

The weights of the normalization can be tuned with "-latticeWeight"

Note that: this parser does not read normal input (see the original Berkeleyparser for that) sentences are splitted by using a line only containing a dot. There should always be something starting at position 0 If you do not have probabilities, 1.0 can be used

other small additions:

  • outputChartSize: Prints the chart size after pruning (at the last parse level)
  • server: runs the jar as a server application, can be used to communicate trough sockets as supported by MoNoise. Specify the port as argument on the commandline.