David McClosky avatar David McClosky committed f1c669c

SWIG wrappers: Add extra includes to fix Python compilation problem
Also added a Python-specific README and updated the MANIFEST to include
more information.

Comments (0)

Files changed (5)

+include README-python.txt
+include NOTICE
+include LICENSE-2.0.txt
 include first-stage/PARSE/*.h
 include first-stage/PARSE/swig/wrapper.i
 include first-stage/PARSE/swig/java/include/std_list.i

README-python.txt

+The BLLIP parser (also known as the Charniak-Johnson parser or
+Brown Reranking Parser) is described in the paper `Charniak
+and Johnson (Association of Computational Linguistics, 2005)
+<http://aclweb.org/anthology/P/P05/P05-1022.pdf>`_.  This code
+provides a Python interface to the parser. Note that it does
+not contain any parsing models which must be downloaded
+separately (for example, `WSJ self-trained parsing model
+<http://cs.brown.edu/~dmcc/selftraining/selftrained.tar.gz>`_).
+The primary maintenance for the parser takes place at `GitHub
+<http://github.com/BLLIP/bllip-parser>`_.
+
+Basic usage
+-----------
+
+The easiest way to construct a parser is with the
+``load_unified_model_dir`` class method. A unified model is a directory
+that contains two subdirectories: ``parser/`` and ``reranker/``, each
+with the respective model files::
+
+    >>> from bllipparser import RerankingParser, tokenize
+    >>> rrp = RerankingParser.load_unified_model_dir('/path/to/model/')
+
+Parsing a single sentence and reading information about the top parse
+with ``parse()``. The parser produces an *n-best list* of the *n* most
+likely parses of the sentence (default: *n=50*). Typically you only want
+the top parse, but the others are available as well::
+
+    >>> nbest_list = rrp.parse('This is a sentence.')
+
+Getting information about the top parse::
+
+    >>> print repr(nbest_list[0])
+    ScoredParse('(S1 (S (NP (DT This)) (VP (VBZ is) (NP (DT a) (NN sentence))) (. .)))', parser_score=-29.621201629004183, reranker_score=-7.9273829816098731)
+    >>> print nbest_list[0].ptb_parse
+    (S1 (S (NP (DT This)) (VP (VBZ is) (NP (DT a) (NN sentence))) (. .)))
+    >>> print nbest_list[0].parser_score
+    -29.621201629
+    >>> print nbest_list[0].reranker_score
+    -7.92738298161
+    >>> print len(nbest_list)
+    50
+
+If you have an existing tokenizer, tokenization can also be specified
+by passing a list of strings::
+
+    >>> nbest_list = rrp.parse(['This', 'is', 'a', 'pretokenized', 'sentence', '.'])
+
+The reranker can be disabled by setting ``rerank=False``::
+
+    >>> nbest_list = rrp.parse('Parser only!', rerank=False)
+
+Parsing text with existing POS tag (soft) constraints. In this example,
+token 0 ('Time') should have tag VB and token 1 ('flies') should have
+tag NNS::
+
+    >>> rrp.parse_tagged(['Time', 'flies'], possible_tags={0 : 'VB', 1 : 'NNS'})[0]
+    ScoredParse('(S1 (NP (VB Time) (NNS flies)))', parser_score=-53.94938875760073, reranker_score=-15.841407102717749)
+
+You don't need to specify a tag for all words: token 0 ('Time') should
+have tag VB and token 1 ('flies') is unconstrained::
+
+    >>> rrp.parse_tagged(['Time', 'flies'], possible_tags={0 : 'VB'})[0]
+    ScoredParse('(S1 (S (VP (VB Time) (NP (VBZ flies)))))', parser_score=-54.390430751112156, reranker_score=-17.290145080887005)
+
+You can specify multiple tags for each token: token 0 ('Time') should
+have tag VB, JJ, or NN and token 1 ('flies') is unconstrained::
+
+    >>> rrp.parse_tagged(['Time', 'flies'], possible_tags={0 : ['VB', 'JJ', 'NN']})[0]
+    ScoredParse('(S1 (NP (NN Time) (VBZ flies)))', parser_score=-42.82904107213723, reranker_score=-12.865900776775314)
+
+Use this if all you want is a tokenizer::
+
+    >>> tokenize("Tokenize this sentence, please.")
+    ['Tokenize', 'this', 'sentence', ',', 'please', '.']

first-stage/PARSE/swig/wrapper.i

 typedef std::string ECString;
 
 %{
+    #include <cstddef>
     #include <fstream>
     #include <math.h>
     #include <unistd.h>

second-stage/programs/features/swig/wrapper.i

 %newobject scoreNBestList;
 
 %inline {
+    #include <cstddef>
+
     class RerankerError {
         public:
             const std::string description;
     extra_compile_args=['-iquote', reranker_base, '-O0'])
 
 setup(name='bllipparser',
-    version='2013.10.16',
+    version='2013.10.16-1',
     description='Python bindings for the BLLIP natural language parser',
+    long_description='See http://pypi.python.org/pypi/bllipparser/',
     author='David McClosky',
     author_email='notsoweird+pybllipparser@gmail.com',
     classifiers=[
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.