Mikhail Korobov avatar Mikhail Korobov committed e6605f1

Less detailed README + dedicated docs. This is to fix installation issues with LC_ALL=C (pip bugs make it not possible to have non-ascii letters in project's long_description). Fix GH-7.

Comments (0)

Files changed (10)

 ^.tox
 \.orig$
 \.prof$
-\.coverage$
+\.coverage$
+^docs/_
+Authors & Contributors
+----------------------
+
+* Mikhail Korobov <kmike84@gmail.com>;
+* Dan Blanchard;
+* Jakub Wilk.
+
+This module uses `dawgdic`_ C++ library by
+Susumu Yata & contributors.
+
+base64 decoder is a modified version of libb64_ (original author
+is Chris Venter).
+
+.. _libb64: http://libb64.sourceforge.net/
+.. _dawgdic: https://code.google.com/p/dawgdic/
+
+Changes
+=======
 
 0.6 (2013-03-22)
 ----------------
 include README.rst
+include AUTHORS.rst
 include CHANGES.rst
 include LICENSE
 include tox.ini
 include update_cpp.sh
 include lib/COPYING
 
+recursive-include docs *.rst *.py Makefile make.bat
+
 recursive-include src *.cpp *.pxd *.pyx
 recursive-include lib *.c *.h
 recursive-include tests *.py
 a standard Python dict and the raw lookup speed is comparable;
 it also provides fast advanced methods like prefix search.
 
-Based on `dawgdic`_ C++ library.
-
-.. _dawgdic: https://code.google.com/p/dawgdic/
 .. _DAFSA: https://en.wikipedia.org/wiki/Deterministic_acyclic_finite_state_automaton
 
-Installation
-============
+Docs: http://dawg.readthedocs.org
 
-pip install DAWG
-
-Usage
-=====
-
-There are several DAWG classes in this package:
-
-* ``dawg.DAWG`` - basic DAWG wrapper; it can store unicode keys
-  and do exact lookups;
-
-* ``dawg.CompletionDAWG`` - ``dawg.DAWG`` subclass that supports
-  key completion and prefix lookups (but requires more memory);
-
-* ``dawg.BytesDAWG`` - ``dawg.CompletionDAWG`` subclass that
-  maps unicode keys to lists of ``bytes`` objects.
-
-* ``dawg.RecordDAWG`` - ``dawg.BytesDAWG`` subclass that
-  maps unicode keys to lists of data tuples.
-  All tuples must be of the same format (the data is packed
-  using python ``struct`` module).
-
-* ``dawg.IntDAWG`` - ``dawg.DAWG`` subclass that maps unicode keys
-  to integer values.
-
-DAWG and CompletionDAWG
------------------------
-
-``DAWG`` and ``CompletionDAWG`` are useful when you need
-fast & memory efficient simple string storage. These classes
-does not support assigning values to keys.
-
-``DAWG`` and ``CompletionDAWG`` constructors accept an iterable with keys::
-
-    >>> import dawg
-    >>> words = [u'foo', u'bar', u'foobar', u'foö', u'bör']
-    >>> base_dawg = dawg.DAWG(words)
-    >>> completion_dawg = dawg.CompletionDAWG(words)
-
-It is then possible to check if the key is in a DAWG::
-
-    >>> u'foo' in base_dawg
-    True
-    >>> u'baz' in completion_dawg
-    False
-
-It is possible to find all keys that starts with a given
-prefix in a ``CompletionDAWG``::
-
-    >>> completion_dawg.keys(u'foo')
-    >>> [u'foo', u'foobar']
-
-and to find all prefixes of a given key::
-
-    >>> base_dawg.prefixes(u'foobarz')
-    [u'foo', u'foobar']
-
-Iterator versions are also available::
-
-    >>> for key in completion_dawg.iterkeys(u'foo'):
-    ...     print(key)
-    foo
-    foobar
-    >>> for prefix in base_dawg.iterprefixes(u'foobarz'):
-    ...     print(prefix)
-    foo
-    foobar
-
-It is possible to find all keys similar to a given key (using a one-way
-char translation table)::
-
-    >>> replaces = dawg.DAWG.compile_replaces({u'o': u'ö'})
-    >>> base_dawg.similar_keys(u'foo', replaces)
-    [u'foo', u'foö']
-    >>> base_dawg.similar_keys(u'foö', replaces)
-    [u'foö']
-    >>> base_dawg.similar_keys(u'bor', replaces)
-    [u'bör']
-
-BytesDAWG
----------
-
-``BytesDAWG`` is a ``CompletionDAWG`` subclass that can store
-binary data for each key.
-
-``BytesDAWG`` constructor accepts an iterable with
-``(unicode_key, bytes_value)`` tuples::
-
-    >>> data = [(u'key1', b'value1'), (u'key2', b'value2'), (u'key1', b'value3')]
-    >>> bytes_dawg = dawg.BytesDAWG(data)
-
-There can be duplicate keys; all unique values are stored in this case::
-
-    >>> bytes_dawg[u'key1']
-    [b'value1, b'value3']
-
-For unique keys a list with a single value is returned for consistency::
-
-    >>> bytes_dawg[u'key2']
-    [b'value2']
-
-``KeyError`` is raised for missing keys; use ``get`` method if you need
-a default value instead::
-
-    >>> bytes_dawg.get(u'foo', None)
-    None
-
-``BytesDAWG`` support ``keys``, ``items``, ``iterkeys`` and ``iteritems``
-methods (they all accept optional key prefix). There is also support for
-``similar_keys``, ``similar_items`` and ``similar_item_values`` methods.
-
-RecordDAWG
-----------
-
-``RecordDAWG`` is a ``BytesDAWG`` subclass that automatically
-packs & unpacks the binary data from/to Python objects
-using ``struct`` module from the standard library.
-
-First, you have to define a format of the data. Consult Python docs
-(http://docs.python.org/library/struct.html#format-strings) for the format
-string specification.
-
-For example, let's store 3 short unsigned numbers (in a Big-Endian byte order)
-as values::
-
-    >>> format = ">HHH"
-
-``RecordDAWG`` constructor accepts an iterable with
-``(unicode_key, value_tuple)``. Let's create such iterable
-using ``zip`` function::
-
-    >>> keys = [u'foo', u'bar', u'foobar', u'foo']
-    >>> values = [(1, 2, 3), (2, 1, 0), (3, 3, 3), (2, 1, 5)]
-    >>> data = zip(keys, values)
-    >>> record_dawg = RecordDAWG(format, data)
-
-As with ``BytesDAWG``, there can be several values for the same key::
-
-    >>> record_dawg['foo']
-    [(1, 2, 3), (2, 1, 5)]
-    >>> record_dawg['foobar']
-    [(3, 3, 3)]
-
-
-BytesDAWG and RecordDAWG implementation details
------------------------------------------------
-
-``BytesDAWG`` and ``RecordDAWG`` stores data at the end of the keys::
-
-    <utf8-encoded key><separator><base64-encoded data>
-
-Data is encoded to base64 because dawgdic_ C++ library doesn't allow
-zero bytes in keys (it uses null-terminated strings) and such keys are
-very likely in binary data.
-
-In DAWG versions prior to 0.5 ``<separator>`` was ``chr(255)`` byte.
-It was chosen because keys are stored as UTF8-encoded strings and
-``chr(255)`` is guaranteed not to appear in valid UTF8, so the end of
-text part of the key is not ambiguous.
-
-But ``chr(255)`` was proven to be problematic: it changes the order
-of the keys. Keys are naturally returned in lexicographical order by DAWG.
-But if ``chr(255)`` appears at the end of each text part of a key then the
-visible order would change. Imagine ``'foo'`` key with some payload
-and ``'foobar'`` key with some payload. ``'foo'`` key would be greater
-than ``'foobar'`` key: values compared would be ``'foo<sep>'`` and ``'foobar<sep>'``
-and ``ord(<sep>)==255`` is greater than ``ord(<any other character>)``.
-
-So now the default ``<separator>`` is chr(1). This is the lowest allowed
-character and so it preserves the alphabetical order.
-
-It is not strictly correct to use chr(1) as a separator because chr(1)
-is a valid UTF8 character. But I think in practice this won't be an issue:
-such control character is very unlikely in text keys, and binary keys
-are not supported anyway because dawgdic_ doesn't support keys containing
-chr(0).
-
-If you can't guarantee chr(1) is not a part of keys, lexicographical order
-is not important to you or there is a need to read
-a ``BytesDAWG``/``RecordDAWG`` created by DAWG < 0.5 then pass
-``payload_separator`` argument to the constructor::
-
-    >>> BytesDAWG(payload_separator=b'\xff').load('old.dawg')
-
-The storage scheme has one more implication: values of ``BytesDAWG``
-and ``RecordDAWG`` are also sorted lexicographically.
-
-For ``RecordDAWG`` there is a gotcha: in order to have meaningful
-ordering of numeric values store them in big-endian format::
-
-    >>> data = [('foo', (3, 2, 256)), ('foo', (3, 2, 1)), ('foo', (3, 2, 3))]
-    >>> d = RecordDAWG("3H", data)
-    >>> d.items()
-    [(u'foo', (3, 2, 256)), (u'foo', (3, 2, 1)), (u'foo', (3, 2, 3))]
-
-    >>> d2 = RecordDAWG(">3H", data)
-    >>> d2.items()
-    [(u'foo', (3, 2, 1)), (u'foo', (3, 2, 3)), (u'foo', (3, 2, 256))]
-
-IntDAWG
--------
-
-``IntDAWG`` is a ``{unicode -> int}`` mapping. It is possible to
-use ``RecordDAWG`` for this, but ``IntDAWG`` is natively
-supported by dawgdic_ C++ library and so ``__getitem__`` is much faster.
-
-Unlike ``BytesDAWG`` and ``RecordDAWG``, ``IntDAWG`` doesn't support
-having several values for the same key.
-
-``IntDAWG`` constructor accepts an iterable with (unicode_key, integer_value)
-tuples::
-
-    >>> data = [ (u'foo', 1), (u'bar', 2) ]
-    >>> int_dawg = dawg.IntDAWG(data)
-
-It is then possible to get a value from the IntDAWG::
-
-    >>> int_dawg[u'foo']
-    1
-
-
-Persistence
------------
-
-All DAWGs support saving/loading and pickling/unpickling.
-
-Write DAWG to a stream::
-
-    >>> with open('words.dawg', 'wb') as f:
-    ...     d.write(f)
-
-Save DAWG to a file::
-
-    >>> d.save('words.dawg')
-
-Load DAWG from a file::
-
-    >>> d = dawg.DAWG()
-    >>> d.load('words.dawg')
-
-.. warning::
-
-    Reading DAWGs from streams and unpickling are currently using 3x memory
-    compared to loading DAWGs using ``load`` method; please avoid them until
-    the issue is fixed.
-
-Read DAWG from a stream::
-
-    >>> d = dawg.RecordDAWG(format_string)
-    >>> with open('words.record-dawg', 'rb') as f:
-    ...     d.read(f)
-
-DAWG objects are picklable::
-
-    >>> import pickle
-    >>> data = pickle.dumps(d)
-    >>> d2 = pickle.loads(data)
-
-Benchmarks
-==========
-
-For a list of 3000000 (3 million) Russian words memory consumption
-with different data structures (under Python 2.7):
-
-* dict(unicode words -> word lenghts): about 600M
-* list(unicode words) : about 300M
-* ``marisa_trie.RecordTrie`` : 11M
-* ``marisa_trie.Trie``: 7M
-* ``dawg.DAWG``: 2M
-* ``dawg.CompletionDAWG``: 3M
-* ``dawg.IntDAWG``: 2.7M
-* ``dawg.RecordDAWG``: 4M
-
-
-.. note::
-
-    Lengths of words were not stored as values in ``dawg.DAWG``,
-    ``dawg.CompletionDAWG`` and ``marisa_trie.Trie`` because they don't
-    support this.
-
-.. note::
-
-    `marisa-trie`_ is often more more memory efficient than
-    DAWG (depending on data); it can also handle larger datasets
-    and provides memory-mapped IO, so don't dismiss `marisa-trie`_
-    based on this README file. It is still several times slower than
-    DAWG though.
-
-.. _marisa-trie: https://github.com/kmike/marisa-trie
-
-Benchmark results (100k unicode words, integer values (lenghts of the words),
-Python 3.3, macbook air i5 1.8 Ghz)::
-
-    dict __getitem__ (hits)           7.300M ops/sec
-    DAWG __getitem__ (hits)           not supported
-    BytesDAWG __getitem__ (hits)      1.230M ops/sec
-    RecordDAWG __getitem__ (hits)     0.792M ops/sec
-    IntDAWG __getitem__ (hits)        4.217M ops/sec
-    dict get() (hits)                 3.775M ops/sec
-    DAWG get() (hits)                 not supported
-    BytesDAWG get() (hits)            1.027M ops/sec
-    RecordDAWG get() (hits)           0.733M ops/sec
-    IntDAWG get() (hits)              3.162M ops/sec
-    dict get() (misses)               4.533M ops/sec
-    DAWG get() (misses)               not supported
-    BytesDAWG get() (misses)          3.545M ops/sec
-    RecordDAWG get() (misses)         3.485M ops/sec
-    IntDAWG get() (misses)            3.928M ops/sec
-
-    dict __contains__ (hits)          7.090M ops/sec
-    DAWG __contains__ (hits)          4.685M ops/sec
-    BytesDAWG __contains__ (hits)     3.885M ops/sec
-    RecordDAWG __contains__ (hits)    3.898M ops/sec
-    IntDAWG __contains__ (hits)       4.612M ops/sec
-
-    dict __contains__ (misses)        5.617M ops/sec
-    DAWG __contains__ (misses)        6.204M ops/sec
-    BytesDAWG __contains__ (misses)   6.026M ops/sec
-    RecordDAWG __contains__ (misses)  6.007M ops/sec
-    IntDAWG __contains__ (misses)     6.180M ops/sec
-
-    DAWG.similar_keys  (no replaces)  0.492M ops/sec
-    DAWG.similar_keys  (l33t)         0.413M ops/sec
-
-    dict items()                      55.032 ops/sec
-    DAWG items()                      not supported
-    BytesDAWG items()                 14.826 ops/sec
-    RecordDAWG items()                9.436 ops/sec
-    IntDAWG items()                   not supported
-
-    dict keys()                       200.788 ops/sec
-    DAWG keys()                       not supported
-    BytesDAWG keys()                  20.657 ops/sec
-    RecordDAWG keys()                 20.873 ops/sec
-    IntDAWG keys()                    not supported
-
-    DAWG.prefixes (hits)              1.552M ops/sec
-    DAWG.prefixes (mixed)             4.342M ops/sec
-    DAWG.prefixes (misses)            4.094M ops/sec
-    DAWG.iterprefixes (hits)          0.391M ops/sec
-    DAWG.iterprefixes (mixed)         0.476M ops/sec
-    DAWG.iterprefixes (misses)        0.498M ops/sec
-
-    RecordDAWG.keys(prefix="xxx"), avg_len(res)==415             5.562K ops/sec
-    RecordDAWG.keys(prefix="xxxxx"), avg_len(res)==17            104.011K ops/sec
-    RecordDAWG.keys(prefix="xxxxxxxx"), avg_len(res)==3          318.129K ops/sec
-    RecordDAWG.keys(prefix="xxxxx..xx"), avg_len(res)==1.4       462.238K ops/sec
-    RecordDAWG.keys(prefix="xxx"), NON_EXISTING                  4292.625K ops/sec
-
-
-Please take this benchmark results with a grain of salt; this
-is a very simple benchmark on a single data set.
-
-
-Current limitations
-===================
-
-* ``IntDAWG`` is currently a subclass of ``DAWG`` and so it doesn't
-  support ``keys()`` and ``items()`` methods;
-* ``read()`` method reads the whole stream (DAWG must be the last or the
-  only item in a stream if it is read with ``read()`` method) - pickling
-  doesn't have this limitation;
-* DAWGs loaded with ``read()`` and unpickled DAWGs uses 3x-4x memory
-  compared to DAWGs loaded with ``load()`` method;
-* there are ``keys()`` and ``items()`` methods but no ``values()`` method;
-* iterator versions of methods are not always implemented;
-* ``BytesDAWG`` and ``RecordDAWG`` has a limitation: values
-  larger than 8KB are unsupported;
-* the maximum number of DAWG units is limited: number of DAWG units
-  (and thus transitions - but not elements) should be less than 2^29;
-  this mean that it may be impossible to build an especially huge DAWG
-  (you may split your data into several DAWGs or try `marisa-trie`_ in
-  this case).
-
-Contributions are welcome!
-
-
-Contributing
-============
-
-Development happens at github and bitbucket:
+Source code:
 
 * https://github.com/kmike/DAWG
 * https://bitbucket.org/kmike/DAWG
 
-The main issue tracker is at github: https://github.com/kmike/DAWG/issues
-
-Feel free to submit ideas, bugs, pull requests (git or hg) or
-regular patches.
-
-If you found a bug in a C++ part please report it to the original
-`bug tracker <https://code.google.com/p/dawgdic/issues/list>`_.
-
-How is source code organized
-----------------------------
-
-There are 4 folders in repository:
-
-* ``bench`` - benchmarks & benchmark data;
-* ``lib`` - original unmodified `dawgdic`_ C++ library and
-  a customized version of `libb64`_ library. They are bundled
-  for easier distribution; if something is have to be fixed in these
-  libraries consider fixing it in the original repositories;
-* ``src`` - wrapper code; ``src/dawg.pyx`` is a wrapper implementation;
-  ``src/*.pxd`` files are Cython headers for corresponding C++ headers;
-  ``src/*.cpp`` files are the pre-built extension code and shouldn't be
-  modified directly (they should be updated via ``update_cpp.sh`` script).
-* ``tests`` - the test suite.
-
-
-Running tests and benchmarks
-----------------------------
-
-Make sure `tox`_ is installed and run
-
-::
-
-    $ tox
-
-from the source checkout. Tests should pass under python 2.6, 2.7, 3.2 and 3.3.
-
-In order to run benchmarks, type
-
-::
-
-    $ tox -c bench.ini
-
-.. _cython: http://cython.org
-.. _tox: http://tox.testrun.org
-
-Authors & Contributors
-----------------------
-
-* Mikhail Korobov <kmike84@gmail.com>;
-* Dan Blanchard;
-* Jakub Wilk.
-
-This module uses `dawgdic`_ C++ library by
-Susumu Yata & contributors.
-
-base64 decoder is a modified version of libb64_ (original author
-is Chris Venter).
-
-.. _libb64: http://libb64.sourceforge.net/
+Issue tracker: https://github.com/kmike/DAWG/issues
 
 License
 =======
 
 Wrapper code is licensed under MIT License.
 Bundled `dawgdic`_ C++ library is licensed under BSD license.
-Bundled libb64_ is Public Domain.
+Bundled libb64_ is Public Domain.
+# Makefile for Sphinx documentation
+#
+
+# You can set these variables from the command line.
+SPHINXOPTS    =
+SPHINXBUILD   = sphinx-build
+PAPER         =
+BUILDDIR      = _build
+
+# Internal variables.
+PAPEROPT_a4     = -D latex_paper_size=a4
+PAPEROPT_letter = -D latex_paper_size=letter
+ALLSPHINXOPTS   = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
+# the i18n builder cannot share the environment and doctrees with the others
+I18NSPHINXOPTS  = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
+
+.PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest gettext
+
+help:
+	@echo "Please use \`make <target>' where <target> is one of"
+	@echo "  html       to make standalone HTML files"
+	@echo "  dirhtml    to make HTML files named index.html in directories"
+	@echo "  singlehtml to make a single large HTML file"
+	@echo "  pickle     to make pickle files"
+	@echo "  json       to make JSON files"
+	@echo "  htmlhelp   to make HTML files and a HTML help project"
+	@echo "  qthelp     to make HTML files and a qthelp project"
+	@echo "  devhelp    to make HTML files and a Devhelp project"
+	@echo "  epub       to make an epub"
+	@echo "  latex      to make LaTeX files, you can set PAPER=a4 or PAPER=letter"
+	@echo "  latexpdf   to make LaTeX files and run them through pdflatex"
+	@echo "  text       to make text files"
+	@echo "  man        to make manual pages"
+	@echo "  texinfo    to make Texinfo files"
+	@echo "  info       to make Texinfo files and run them through makeinfo"
+	@echo "  gettext    to make PO message catalogs"
+	@echo "  changes    to make an overview of all changed/added/deprecated items"
+	@echo "  linkcheck  to check all external links for integrity"
+	@echo "  doctest    to run all doctests embedded in the documentation (if enabled)"
+
+clean:
+	-rm -rf $(BUILDDIR)/*
+
+html:
+	$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
+	@echo
+	@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
+
+dirhtml:
+	$(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml
+	@echo
+	@echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml."
+
+singlehtml:
+	$(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml
+	@echo
+	@echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml."
+
+pickle:
+	$(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle
+	@echo
+	@echo "Build finished; now you can process the pickle files."
+
+json:
+	$(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json
+	@echo
+	@echo "Build finished; now you can process the JSON files."
+
+htmlhelp:
+	$(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp
+	@echo
+	@echo "Build finished; now you can run HTML Help Workshop with the" \
+	      ".hhp project file in $(BUILDDIR)/htmlhelp."
+
+qthelp:
+	$(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp
+	@echo
+	@echo "Build finished; now you can run "qcollectiongenerator" with the" \
+	      ".qhcp project file in $(BUILDDIR)/qthelp, like this:"
+	@echo "# qcollectiongenerator $(BUILDDIR)/qthelp/DAWG.qhcp"
+	@echo "To view the help file:"
+	@echo "# assistant -collectionFile $(BUILDDIR)/qthelp/DAWG.qhc"
+
+devhelp:
+	$(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp
+	@echo
+	@echo "Build finished."
+	@echo "To view the help file:"
+	@echo "# mkdir -p $$HOME/.local/share/devhelp/DAWG"
+	@echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/DAWG"
+	@echo "# devhelp"
+
+epub:
+	$(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub
+	@echo
+	@echo "Build finished. The epub file is in $(BUILDDIR)/epub."
+
+latex:
+	$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
+	@echo
+	@echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex."
+	@echo "Run \`make' in that directory to run these through (pdf)latex" \
+	      "(use \`make latexpdf' here to do that automatically)."
+
+latexpdf:
+	$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
+	@echo "Running LaTeX files through pdflatex..."
+	$(MAKE) -C $(BUILDDIR)/latex all-pdf
+	@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
+
+text:
+	$(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text
+	@echo
+	@echo "Build finished. The text files are in $(BUILDDIR)/text."
+
+man:
+	$(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man
+	@echo
+	@echo "Build finished. The manual pages are in $(BUILDDIR)/man."
+
+texinfo:
+	$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
+	@echo
+	@echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo."
+	@echo "Run \`make' in that directory to run these through makeinfo" \
+	      "(use \`make info' here to do that automatically)."
+
+info:
+	$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
+	@echo "Running Texinfo files through makeinfo..."
+	make -C $(BUILDDIR)/texinfo info
+	@echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo."
+
+gettext:
+	$(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale
+	@echo
+	@echo "Build finished. The message catalogs are in $(BUILDDIR)/locale."
+
+changes:
+	$(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes
+	@echo
+	@echo "The overview file is in $(BUILDDIR)/changes."
+
+linkcheck:
+	$(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck
+	@echo
+	@echo "Link check complete; look for any errors in the above output " \
+	      "or in $(BUILDDIR)/linkcheck/output.txt."
+
+doctest:
+	$(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest
+	@echo "Testing of doctests in the sources finished, look at the " \
+	      "results in $(BUILDDIR)/doctest/output.txt."
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+#
+# DAWG documentation build configuration file, created by
+# sphinx-quickstart on Sat Mar 23 00:33:42 2013.
+#
+# This file is execfile()d with the current directory set to its containing dir.
+#
+# Note that not all possible configuration values are present in this
+# autogenerated file.
+#
+# All configuration values have a default; values that are commented out
+# serve to show the default.
+
+import sys, os
+
+# If extensions (or modules to document with autodoc) are in another directory,
+# add these directories to sys.path here. If the directory is relative to the
+# documentation root, use os.path.abspath to make it absolute, like shown here.
+#sys.path.insert(0, os.path.abspath('.'))
+
+# -- General configuration -----------------------------------------------------
+
+# If your documentation needs a minimal Sphinx version, state it here.
+#needs_sphinx = '1.0'
+
+# Add any Sphinx extension module names here, as strings. They can be extensions
+# coming with Sphinx (named 'sphinx.ext.*') or your custom ones.
+extensions = []
+
+# Add any paths that contain templates here, relative to this directory.
+templates_path = ['_templates']
+
+# The suffix of source filenames.
+source_suffix = '.rst'
+
+# The encoding of source files.
+#source_encoding = 'utf-8-sig'
+
+# The master toctree document.
+master_doc = 'index'
+
+# General information about the project.
+project = 'DAWG'
+copyright = '2013, Mikhail Korobov'
+
+# The version info for the project you're documenting, acts as replacement for
+# |version| and |release|, also used in various other places throughout the
+# built documents.
+#
+# The short X.Y version.
+version = '0.6'
+# The full version, including alpha/beta/rc tags.
+release = '0.6'
+
+# The language for content autogenerated by Sphinx. Refer to documentation
+# for a list of supported languages.
+#language = None
+
+# There are two options for replacing |today|: either, you set today to some
+# non-false value, then it is used:
+#today = ''
+# Else, today_fmt is used as the format for a strftime call.
+#today_fmt = '%B %d, %Y'
+
+# List of patterns, relative to source directory, that match files and
+# directories to ignore when looking for source files.
+exclude_patterns = ['_build']
+
+# The reST default role (used for this markup: `text`) to use for all documents.
+#default_role = None
+
+# If true, '()' will be appended to :func: etc. cross-reference text.
+#add_function_parentheses = True
+
+# If true, the current module name will be prepended to all description
+# unit titles (such as .. function::).
+#add_module_names = True
+
+# If true, sectionauthor and moduleauthor directives will be shown in the
+# output. They are ignored by default.
+#show_authors = False
+
+# The name of the Pygments (syntax highlighting) style to use.
+pygments_style = 'sphinx'
+
+# A list of ignored prefixes for module index sorting.
+#modindex_common_prefix = []
+
+
+# -- Options for HTML output ---------------------------------------------------
+
+# The theme to use for HTML and HTML Help pages.  See the documentation for
+# a list of builtin themes.
+html_theme = 'default'
+
+# Theme options are theme-specific and customize the look and feel of a theme
+# further.  For a list of options available for each theme, see the
+# documentation.
+#html_theme_options = {}
+
+# Add any paths that contain custom themes here, relative to this directory.
+#html_theme_path = []
+
+# The name for this set of Sphinx documents.  If None, it defaults to
+# "<project> v<release> documentation".
+#html_title = None
+
+# A shorter title for the navigation bar.  Default is the same as html_title.
+#html_short_title = None
+
+# The name of an image file (relative to this directory) to place at the top
+# of the sidebar.
+#html_logo = None
+
+# The name of an image file (within the static path) to use as favicon of the
+# docs.  This file should be a Windows icon file (.ico) being 16x16 or 32x32
+# pixels large.
+#html_favicon = None
+
+# Add any paths that contain custom static files (such as style sheets) here,
+# relative to this directory. They are copied after the builtin static files,
+# so a file named "default.css" will overwrite the builtin "default.css".
+html_static_path = ['_static']
+
+# If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
+# using the given strftime format.
+#html_last_updated_fmt = '%b %d, %Y'
+
+# If true, SmartyPants will be used to convert quotes and dashes to
+# typographically correct entities.
+#html_use_smartypants = True
+
+# Custom sidebar templates, maps document names to template names.
+#html_sidebars = {}
+
+# Additional templates that should be rendered to pages, maps page names to
+# template names.
+#html_additional_pages = {}
+
+# If false, no module index is generated.
+#html_domain_indices = True
+
+# If false, no index is generated.
+#html_use_index = True
+
+# If true, the index is split into individual pages for each letter.
+#html_split_index = False
+
+# If true, links to the reST sources are added to the pages.
+#html_show_sourcelink = True
+
+# If true, "Created using Sphinx" is shown in the HTML footer. Default is True.
+#html_show_sphinx = True
+
+# If true, "(C) Copyright ..." is shown in the HTML footer. Default is True.
+#html_show_copyright = True
+
+# If true, an OpenSearch description file will be output, and all pages will
+# contain a <link> tag referring to it.  The value of this option must be the
+# base URL from which the finished HTML is served.
+#html_use_opensearch = ''
+
+# This is the file name suffix for HTML files (e.g. ".xhtml").
+#html_file_suffix = None
+
+# Output file base name for HTML help builder.
+htmlhelp_basename = 'DAWGdoc'
+
+
+# -- Options for LaTeX output --------------------------------------------------
+
+latex_elements = {
+# The paper size ('letterpaper' or 'a4paper').
+#'papersize': 'letterpaper',
+
+# The font size ('10pt', '11pt' or '12pt').
+#'pointsize': '10pt',
+
+# Additional stuff for the LaTeX preamble.
+#'preamble': '',
+}
+
+# Grouping the document tree into LaTeX files. List of tuples
+# (source start file, target name, title, author, documentclass [howto/manual]).
+latex_documents = [
+  ('index', 'DAWG.tex', 'DAWG Documentation',
+   'Mikhail Korobov', 'manual'),
+]
+
+# The name of an image file (relative to this directory) to place at the top of
+# the title page.
+#latex_logo = None
+
+# For "manual" documents, if this is true, then toplevel headings are parts,
+# not chapters.
+#latex_use_parts = False
+
+# If true, show page references after internal links.
+#latex_show_pagerefs = False
+
+# If true, show URL addresses after external links.
+#latex_show_urls = False
+
+# Documents to append as an appendix to all manuals.
+#latex_appendices = []
+
+# If false, no module index is generated.
+#latex_domain_indices = True
+
+
+# -- Options for manual page output --------------------------------------------
+
+# One entry per manual page. List of tuples
+# (source start file, name, description, authors, manual section).
+man_pages = [
+    ('index', 'dawg', 'DAWG Documentation',
+     ['Mikhail Korobov'], 1)
+]
+
+# If true, show URL addresses after external links.
+#man_show_urls = False
+
+
+# -- Options for Texinfo output ------------------------------------------------
+
+# Grouping the document tree into Texinfo files. List of tuples
+# (source start file, target name, title, author,
+#  dir menu entry, description, category)
+texinfo_documents = [
+  ('index', 'DAWG', 'DAWG Documentation',
+   'Mikhail Korobov', 'DAWG', 'One line description of project.',
+   'Miscellaneous'),
+]
+
+# Documents to append as an appendix to all manuals.
+#texinfo_appendices = []
+
+# If false, no module index is generated.
+#texinfo_domain_indices = True
+
+# How to display URL addresses: 'footnote', 'no', or 'inline'.
+#texinfo_show_urls = 'footnote'
+==================
+DAWG documentation
+==================
+
+This package provides DAWG(DAFSA_)-based dictionary-like
+read-only objects for Python (2.x and 3.x).
+
+String data in a DAWG may take 200x less memory than in
+a standard Python dict and the raw lookup speed is comparable;
+it also provides fast advanced methods like prefix search.
+
+Based on `dawgdic`_ C++ library.
+
+.. _dawgdic: https://code.google.com/p/dawgdic/
+.. _DAFSA: https://en.wikipedia.org/wiki/Deterministic_acyclic_finite_state_automaton
+
+License
+=======
+
+Wrapper code is licensed under MIT License.
+Bundled `dawgdic`_ C++ library is licensed under BSD license.
+Bundled libb64_ is Public Domain.
+
+Installation
+============
+
+pip install DAWG
+
+Usage
+=====
+
+There are several DAWG classes in this package:
+
+* ``dawg.DAWG`` - basic DAWG wrapper; it can store unicode keys
+  and do exact lookups;
+
+* ``dawg.CompletionDAWG`` - ``dawg.DAWG`` subclass that supports
+  key completion and prefix lookups (but requires more memory);
+
+* ``dawg.BytesDAWG`` - ``dawg.CompletionDAWG`` subclass that
+  maps unicode keys to lists of ``bytes`` objects.
+
+* ``dawg.RecordDAWG`` - ``dawg.BytesDAWG`` subclass that
+  maps unicode keys to lists of data tuples.
+  All tuples must be of the same format (the data is packed
+  using python ``struct`` module).
+
+* ``dawg.IntDAWG`` - ``dawg.DAWG`` subclass that maps unicode keys
+  to integer values.
+
+DAWG and CompletionDAWG
+-----------------------
+
+``DAWG`` and ``CompletionDAWG`` are useful when you need
+fast & memory efficient simple string storage. These classes
+does not support assigning values to keys.
+
+``DAWG`` and ``CompletionDAWG`` constructors accept an iterable with keys::
+
+    >>> import dawg
+    >>> words = [u'foo', u'bar', u'foobar', u'foö', u'bör']
+    >>> base_dawg = dawg.DAWG(words)
+    >>> completion_dawg = dawg.CompletionDAWG(words)
+
+It is then possible to check if the key is in a DAWG::
+
+    >>> u'foo' in base_dawg
+    True
+    >>> u'baz' in completion_dawg
+    False
+
+It is possible to find all keys that starts with a given
+prefix in a ``CompletionDAWG``::
+
+    >>> completion_dawg.keys(u'foo')
+    >>> [u'foo', u'foobar']
+
+and to find all prefixes of a given key::
+
+    >>> base_dawg.prefixes(u'foobarz')
+    [u'foo', u'foobar']
+
+Iterator versions are also available::
+
+    >>> for key in completion_dawg.iterkeys(u'foo'):
+    ...     print(key)
+    foo
+    foobar
+    >>> for prefix in base_dawg.iterprefixes(u'foobarz'):
+    ...     print(prefix)
+    foo
+    foobar
+
+It is possible to find all keys similar to a given key (using a one-way
+char translation table)::
+
+    >>> replaces = dawg.DAWG.compile_replaces({u'o': u'ö'})
+    >>> base_dawg.similar_keys(u'foo', replaces)
+    [u'foo', u'foö']
+    >>> base_dawg.similar_keys(u'foö', replaces)
+    [u'foö']
+    >>> base_dawg.similar_keys(u'bor', replaces)
+    [u'bör']
+
+BytesDAWG
+---------
+
+``BytesDAWG`` is a ``CompletionDAWG`` subclass that can store
+binary data for each key.
+
+``BytesDAWG`` constructor accepts an iterable with
+``(unicode_key, bytes_value)`` tuples::
+
+    >>> data = [(u'key1', b'value1'), (u'key2', b'value2'), (u'key1', b'value3')]
+    >>> bytes_dawg = dawg.BytesDAWG(data)
+
+There can be duplicate keys; all unique values are stored in this case::
+
+    >>> bytes_dawg[u'key1']
+    [b'value1, b'value3']
+
+For unique keys a list with a single value is returned for consistency::
+
+    >>> bytes_dawg[u'key2']
+    [b'value2']
+
+``KeyError`` is raised for missing keys; use ``get`` method if you need
+a default value instead::
+
+    >>> bytes_dawg.get(u'foo', None)
+    None
+
+``BytesDAWG`` support ``keys``, ``items``, ``iterkeys`` and ``iteritems``
+methods (they all accept optional key prefix). There is also support for
+``similar_keys``, ``similar_items`` and ``similar_item_values`` methods.
+
+RecordDAWG
+----------
+
+``RecordDAWG`` is a ``BytesDAWG`` subclass that automatically
+packs & unpacks the binary data from/to Python objects
+using ``struct`` module from the standard library.
+
+First, you have to define a format of the data. Consult Python docs
+(http://docs.python.org/library/struct.html#format-strings) for the format
+string specification.
+
+For example, let's store 3 short unsigned numbers (in a Big-Endian byte order)
+as values::
+
+    >>> format = ">HHH"
+
+``RecordDAWG`` constructor accepts an iterable with
+``(unicode_key, value_tuple)``. Let's create such iterable
+using ``zip`` function::
+
+    >>> keys = [u'foo', u'bar', u'foobar', u'foo']
+    >>> values = [(1, 2, 3), (2, 1, 0), (3, 3, 3), (2, 1, 5)]
+    >>> data = zip(keys, values)
+    >>> record_dawg = RecordDAWG(format, data)
+
+As with ``BytesDAWG``, there can be several values for the same key::
+
+    >>> record_dawg['foo']
+    [(1, 2, 3), (2, 1, 5)]
+    >>> record_dawg['foobar']
+    [(3, 3, 3)]
+
+
+BytesDAWG and RecordDAWG implementation details
+-----------------------------------------------
+
+``BytesDAWG`` and ``RecordDAWG`` stores data at the end of the keys::
+
+    <utf8-encoded key><separator><base64-encoded data>
+
+Data is encoded to base64 because dawgdic_ C++ library doesn't allow
+zero bytes in keys (it uses null-terminated strings) and such keys are
+very likely in binary data.
+
+In DAWG versions prior to 0.5 ``<separator>`` was ``chr(255)`` byte.
+It was chosen because keys are stored as UTF8-encoded strings and
+``chr(255)`` is guaranteed not to appear in valid UTF8, so the end of
+text part of the key is not ambiguous.
+
+But ``chr(255)`` was proven to be problematic: it changes the order
+of the keys. Keys are naturally returned in lexicographical order by DAWG.
+But if ``chr(255)`` appears at the end of each text part of a key then the
+visible order would change. Imagine ``'foo'`` key with some payload
+and ``'foobar'`` key with some payload. ``'foo'`` key would be greater
+than ``'foobar'`` key: values compared would be ``'foo<sep>'`` and ``'foobar<sep>'``
+and ``ord(<sep>)==255`` is greater than ``ord(<any other character>)``.
+
+So now the default ``<separator>`` is chr(1). This is the lowest allowed
+character and so it preserves the alphabetical order.
+
+It is not strictly correct to use chr(1) as a separator because chr(1)
+is a valid UTF8 character. But I think in practice this won't be an issue:
+such control character is very unlikely in text keys, and binary keys
+are not supported anyway because dawgdic_ doesn't support keys containing
+chr(0).
+
+If you can't guarantee chr(1) is not a part of keys, lexicographical order
+is not important to you or there is a need to read
+a ``BytesDAWG``/``RecordDAWG`` created by DAWG < 0.5 then pass
+``payload_separator`` argument to the constructor::
+
+    >>> BytesDAWG(payload_separator=b'\xff').load('old.dawg')
+
+The storage scheme has one more implication: values of ``BytesDAWG``
+and ``RecordDAWG`` are also sorted lexicographically.
+
+For ``RecordDAWG`` there is a gotcha: in order to have meaningful
+ordering of numeric values store them in big-endian format::
+
+    >>> data = [('foo', (3, 2, 256)), ('foo', (3, 2, 1)), ('foo', (3, 2, 3))]
+    >>> d = RecordDAWG("3H", data)
+    >>> d.items()
+    [(u'foo', (3, 2, 256)), (u'foo', (3, 2, 1)), (u'foo', (3, 2, 3))]
+
+    >>> d2 = RecordDAWG(">3H", data)
+    >>> d2.items()
+    [(u'foo', (3, 2, 1)), (u'foo', (3, 2, 3)), (u'foo', (3, 2, 256))]
+
+IntDAWG
+-------
+
+``IntDAWG`` is a ``{unicode -> int}`` mapping. It is possible to
+use ``RecordDAWG`` for this, but ``IntDAWG`` is natively
+supported by dawgdic_ C++ library and so ``__getitem__`` is much faster.
+
+Unlike ``BytesDAWG`` and ``RecordDAWG``, ``IntDAWG`` doesn't support
+having several values for the same key.
+
+``IntDAWG`` constructor accepts an iterable with (unicode_key, integer_value)
+tuples::
+
+    >>> data = [ (u'foo', 1), (u'bar', 2) ]
+    >>> int_dawg = dawg.IntDAWG(data)
+
+It is then possible to get a value from the IntDAWG::
+
+    >>> int_dawg[u'foo']
+    1
+
+
+Persistence
+-----------
+
+All DAWGs support saving/loading and pickling/unpickling.
+
+Write DAWG to a stream::
+
+    >>> with open('words.dawg', 'wb') as f:
+    ...     d.write(f)
+
+Save DAWG to a file::
+
+    >>> d.save('words.dawg')
+
+Load DAWG from a file::
+
+    >>> d = dawg.DAWG()
+    >>> d.load('words.dawg')
+
+.. warning::
+
+    Reading DAWGs from streams and unpickling are currently using 3x memory
+    compared to loading DAWGs using ``load`` method; please avoid them until
+    the issue is fixed.
+
+Read DAWG from a stream::
+
+    >>> d = dawg.RecordDAWG(format_string)
+    >>> with open('words.record-dawg', 'rb') as f:
+    ...     d.read(f)
+
+DAWG objects are picklable::
+
+    >>> import pickle
+    >>> data = pickle.dumps(d)
+    >>> d2 = pickle.loads(data)
+
+Benchmarks
+==========
+
+For a list of 3000000 (3 million) Russian words memory consumption
+with different data structures (under Python 2.7):
+
+* dict(unicode words -> word lenghts): about 600M
+* list(unicode words) : about 300M
+* ``marisa_trie.RecordTrie`` : 11M
+* ``marisa_trie.Trie``: 7M
+* ``dawg.DAWG``: 2M
+* ``dawg.CompletionDAWG``: 3M
+* ``dawg.IntDAWG``: 2.7M
+* ``dawg.RecordDAWG``: 4M
+
+
+.. note::
+
+    Lengths of words were not stored as values in ``dawg.DAWG``,
+    ``dawg.CompletionDAWG`` and ``marisa_trie.Trie`` because they don't
+    support this.
+
+.. note::
+
+    `marisa-trie`_ is often more more memory efficient than
+    DAWG (depending on data); it can also handle larger datasets
+    and provides memory-mapped IO, so don't dismiss `marisa-trie`_
+    based on this README file. It is still several times slower than
+    DAWG though.
+
+.. _marisa-trie: https://github.com/kmike/marisa-trie
+
+Benchmark results (100k unicode words, integer values (lenghts of the words),
+Python 3.3, macbook air i5 1.8 Ghz)::
+
+    dict __getitem__ (hits)           7.300M ops/sec
+    DAWG __getitem__ (hits)           not supported
+    BytesDAWG __getitem__ (hits)      1.230M ops/sec
+    RecordDAWG __getitem__ (hits)     0.792M ops/sec
+    IntDAWG __getitem__ (hits)        4.217M ops/sec
+    dict get() (hits)                 3.775M ops/sec
+    DAWG get() (hits)                 not supported
+    BytesDAWG get() (hits)            1.027M ops/sec
+    RecordDAWG get() (hits)           0.733M ops/sec
+    IntDAWG get() (hits)              3.162M ops/sec
+    dict get() (misses)               4.533M ops/sec
+    DAWG get() (misses)               not supported
+    BytesDAWG get() (misses)          3.545M ops/sec
+    RecordDAWG get() (misses)         3.485M ops/sec
+    IntDAWG get() (misses)            3.928M ops/sec
+
+    dict __contains__ (hits)          7.090M ops/sec
+    DAWG __contains__ (hits)          4.685M ops/sec
+    BytesDAWG __contains__ (hits)     3.885M ops/sec
+    RecordDAWG __contains__ (hits)    3.898M ops/sec
+    IntDAWG __contains__ (hits)       4.612M ops/sec
+
+    dict __contains__ (misses)        5.617M ops/sec
+    DAWG __contains__ (misses)        6.204M ops/sec
+    BytesDAWG __contains__ (misses)   6.026M ops/sec
+    RecordDAWG __contains__ (misses)  6.007M ops/sec
+    IntDAWG __contains__ (misses)     6.180M ops/sec
+
+    DAWG.similar_keys  (no replaces)  0.492M ops/sec
+    DAWG.similar_keys  (l33t)         0.413M ops/sec
+
+    dict items()                      55.032 ops/sec
+    DAWG items()                      not supported
+    BytesDAWG items()                 14.826 ops/sec
+    RecordDAWG items()                9.436 ops/sec
+    IntDAWG items()                   not supported
+
+    dict keys()                       200.788 ops/sec
+    DAWG keys()                       not supported
+    BytesDAWG keys()                  20.657 ops/sec
+    RecordDAWG keys()                 20.873 ops/sec
+    IntDAWG keys()                    not supported
+
+    DAWG.prefixes (hits)              1.552M ops/sec
+    DAWG.prefixes (mixed)             4.342M ops/sec
+    DAWG.prefixes (misses)            4.094M ops/sec
+    DAWG.iterprefixes (hits)          0.391M ops/sec
+    DAWG.iterprefixes (mixed)         0.476M ops/sec
+    DAWG.iterprefixes (misses)        0.498M ops/sec
+
+    RecordDAWG.keys(prefix="xxx"), avg_len(res)==415             5.562K ops/sec
+    RecordDAWG.keys(prefix="xxxxx"), avg_len(res)==17            104.011K ops/sec
+    RecordDAWG.keys(prefix="xxxxxxxx"), avg_len(res)==3          318.129K ops/sec
+    RecordDAWG.keys(prefix="xxxxx..xx"), avg_len(res)==1.4       462.238K ops/sec
+    RecordDAWG.keys(prefix="xxx"), NON_EXISTING                  4292.625K ops/sec
+
+
+Please take this benchmark results with a grain of salt; this
+is a very simple benchmark on a single data set.
+
+
+Current limitations
+===================
+
+* ``IntDAWG`` is currently a subclass of ``DAWG`` and so it doesn't
+  support ``keys()`` and ``items()`` methods;
+* ``read()`` method reads the whole stream (DAWG must be the last or the
+  only item in a stream if it is read with ``read()`` method) - pickling
+  doesn't have this limitation;
+* DAWGs loaded with ``read()`` and unpickled DAWGs uses 3x-4x memory
+  compared to DAWGs loaded with ``load()`` method;
+* there are ``keys()`` and ``items()`` methods but no ``values()`` method;
+* iterator versions of methods are not always implemented;
+* ``BytesDAWG`` and ``RecordDAWG`` has a limitation: values
+  larger than 8KB are unsupported;
+* the maximum number of DAWG units is limited: number of DAWG units
+  (and thus transitions - but not elements) should be less than 2^29;
+  this mean that it may be impossible to build an especially huge DAWG
+  (you may split your data into several DAWGs or try `marisa-trie`_ in
+  this case).
+
+Contributions are welcome!
+
+
+Contributing
+============
+
+Development happens at github and bitbucket:
+
+* https://github.com/kmike/DAWG
+* https://bitbucket.org/kmike/DAWG
+
+The main issue tracker is at github: https://github.com/kmike/DAWG/issues
+
+Feel free to submit ideas, bugs, pull requests (git or hg) or
+regular patches.
+
+If you found a bug in a C++ part please report it to the original
+`bug tracker <https://code.google.com/p/dawgdic/issues/list>`_.
+
+How is source code organized
+----------------------------
+
+There are 4 folders in repository:
+
+* ``bench`` - benchmarks & benchmark data;
+* ``lib`` - original unmodified `dawgdic`_ C++ library and
+  a customized version of `libb64`_ library. They are bundled
+  for easier distribution; if something is have to be fixed in these
+  libraries consider fixing it in the original repositories;
+* ``src`` - wrapper code; ``src/dawg.pyx`` is a wrapper implementation;
+  ``src/*.pxd`` files are Cython headers for corresponding C++ headers;
+  ``src/*.cpp`` files are the pre-built extension code and shouldn't be
+  modified directly (they should be updated via ``update_cpp.sh`` script).
+* ``tests`` - the test suite.
+
+
+Running tests and benchmarks
+----------------------------
+
+Make sure `tox`_ is installed and run
+
+::
+
+    $ tox
+
+from the source checkout. Tests should pass under python 2.6, 2.7, 3.2 and 3.3.
+
+In order to run benchmarks, type
+
+::
+
+    $ tox -c bench.ini
+
+.. _cython: http://cython.org
+.. _tox: http://tox.testrun.org
+
+.. include:: ../AUTHORS.rst
+
+.. include:: ../CHANGES.rst
+@ECHO OFF
+
+REM Command file for Sphinx documentation
+
+if "%SPHINXBUILD%" == "" (
+	set SPHINXBUILD=sphinx-build
+)
+set BUILDDIR=_build
+set ALLSPHINXOPTS=-d %BUILDDIR%/doctrees %SPHINXOPTS% .
+set I18NSPHINXOPTS=%SPHINXOPTS% .
+if NOT "%PAPER%" == "" (
+	set ALLSPHINXOPTS=-D latex_paper_size=%PAPER% %ALLSPHINXOPTS%
+	set I18NSPHINXOPTS=-D latex_paper_size=%PAPER% %I18NSPHINXOPTS%
+)
+
+if "%1" == "" goto help
+
+if "%1" == "help" (
+	:help
+	echo.Please use `make ^<target^>` where ^<target^> is one of
+	echo.  html       to make standalone HTML files
+	echo.  dirhtml    to make HTML files named index.html in directories
+	echo.  singlehtml to make a single large HTML file
+	echo.  pickle     to make pickle files
+	echo.  json       to make JSON files
+	echo.  htmlhelp   to make HTML files and a HTML help project
+	echo.  qthelp     to make HTML files and a qthelp project
+	echo.  devhelp    to make HTML files and a Devhelp project
+	echo.  epub       to make an epub
+	echo.  latex      to make LaTeX files, you can set PAPER=a4 or PAPER=letter
+	echo.  text       to make text files
+	echo.  man        to make manual pages
+	echo.  texinfo    to make Texinfo files
+	echo.  gettext    to make PO message catalogs
+	echo.  changes    to make an overview over all changed/added/deprecated items
+	echo.  linkcheck  to check all external links for integrity
+	echo.  doctest    to run all doctests embedded in the documentation if enabled
+	goto end
+)
+
+if "%1" == "clean" (
+	for /d %%i in (%BUILDDIR%\*) do rmdir /q /s %%i
+	del /q /s %BUILDDIR%\*
+	goto end
+)
+
+if "%1" == "html" (
+	%SPHINXBUILD% -b html %ALLSPHINXOPTS% %BUILDDIR%/html
+	if errorlevel 1 exit /b 1
+	echo.
+	echo.Build finished. The HTML pages are in %BUILDDIR%/html.
+	goto end
+)
+
+if "%1" == "dirhtml" (
+	%SPHINXBUILD% -b dirhtml %ALLSPHINXOPTS% %BUILDDIR%/dirhtml
+	if errorlevel 1 exit /b 1
+	echo.
+	echo.Build finished. The HTML pages are in %BUILDDIR%/dirhtml.
+	goto end
+)
+
+if "%1" == "singlehtml" (
+	%SPHINXBUILD% -b singlehtml %ALLSPHINXOPTS% %BUILDDIR%/singlehtml
+	if errorlevel 1 exit /b 1
+	echo.
+	echo.Build finished. The HTML pages are in %BUILDDIR%/singlehtml.
+	goto end
+)
+
+if "%1" == "pickle" (
+	%SPHINXBUILD% -b pickle %ALLSPHINXOPTS% %BUILDDIR%/pickle
+	if errorlevel 1 exit /b 1
+	echo.
+	echo.Build finished; now you can process the pickle files.
+	goto end
+)
+
+if "%1" == "json" (
+	%SPHINXBUILD% -b json %ALLSPHINXOPTS% %BUILDDIR%/json
+	if errorlevel 1 exit /b 1
+	echo.
+	echo.Build finished; now you can process the JSON files.
+	goto end
+)
+
+if "%1" == "htmlhelp" (
+	%SPHINXBUILD% -b htmlhelp %ALLSPHINXOPTS% %BUILDDIR%/htmlhelp
+	if errorlevel 1 exit /b 1
+	echo.
+	echo.Build finished; now you can run HTML Help Workshop with the ^
+.hhp project file in %BUILDDIR%/htmlhelp.
+	goto end
+)
+
+if "%1" == "qthelp" (
+	%SPHINXBUILD% -b qthelp %ALLSPHINXOPTS% %BUILDDIR%/qthelp
+	if errorlevel 1 exit /b 1
+	echo.
+	echo.Build finished; now you can run "qcollectiongenerator" with the ^
+.qhcp project file in %BUILDDIR%/qthelp, like this:
+	echo.^> qcollectiongenerator %BUILDDIR%\qthelp\DAWG.qhcp
+	echo.To view the help file:
+	echo.^> assistant -collectionFile %BUILDDIR%\qthelp\DAWG.ghc
+	goto end
+)
+
+if "%1" == "devhelp" (
+	%SPHINXBUILD% -b devhelp %ALLSPHINXOPTS% %BUILDDIR%/devhelp
+	if errorlevel 1 exit /b 1
+	echo.
+	echo.Build finished.
+	goto end
+)
+
+if "%1" == "epub" (
+	%SPHINXBUILD% -b epub %ALLSPHINXOPTS% %BUILDDIR%/epub
+	if errorlevel 1 exit /b 1
+	echo.
+	echo.Build finished. The epub file is in %BUILDDIR%/epub.
+	goto end
+)
+
+if "%1" == "latex" (
+	%SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex
+	if errorlevel 1 exit /b 1
+	echo.
+	echo.Build finished; the LaTeX files are in %BUILDDIR%/latex.
+	goto end
+)
+
+if "%1" == "text" (
+	%SPHINXBUILD% -b text %ALLSPHINXOPTS% %BUILDDIR%/text
+	if errorlevel 1 exit /b 1
+	echo.
+	echo.Build finished. The text files are in %BUILDDIR%/text.
+	goto end
+)
+
+if "%1" == "man" (
+	%SPHINXBUILD% -b man %ALLSPHINXOPTS% %BUILDDIR%/man
+	if errorlevel 1 exit /b 1
+	echo.
+	echo.Build finished. The manual pages are in %BUILDDIR%/man.
+	goto end
+)
+
+if "%1" == "texinfo" (
+	%SPHINXBUILD% -b texinfo %ALLSPHINXOPTS% %BUILDDIR%/texinfo
+	if errorlevel 1 exit /b 1
+	echo.
+	echo.Build finished. The Texinfo files are in %BUILDDIR%/texinfo.
+	goto end
+)
+
+if "%1" == "gettext" (
+	%SPHINXBUILD% -b gettext %I18NSPHINXOPTS% %BUILDDIR%/locale
+	if errorlevel 1 exit /b 1
+	echo.
+	echo.Build finished. The message catalogs are in %BUILDDIR%/locale.
+	goto end
+)
+
+if "%1" == "changes" (
+	%SPHINXBUILD% -b changes %ALLSPHINXOPTS% %BUILDDIR%/changes
+	if errorlevel 1 exit /b 1
+	echo.
+	echo.The overview file is in %BUILDDIR%/changes.
+	goto end
+)
+
+if "%1" == "linkcheck" (
+	%SPHINXBUILD% -b linkcheck %ALLSPHINXOPTS% %BUILDDIR%/linkcheck
+	if errorlevel 1 exit /b 1
+	echo.
+	echo.Link check complete; look for any errors in the above output ^
+or in %BUILDDIR%/linkcheck/output.txt.
+	goto end
+)
+
+if "%1" == "doctest" (
+	%SPHINXBUILD% -b doctest %ALLSPHINXOPTS% %BUILDDIR%/doctest
+	if errorlevel 1 exit /b 1
+	echo.
+	echo.Testing of doctests in the sources finished, look at the ^
+results in %BUILDDIR%/doctest/output.txt.
+	goto end
+)
+
+:end
     name="DAWG",
     version="0.6",
     description="Fast and memory efficient DAWG for Python",
-    long_description = read_utf8_file('README.rst') +'\n' + read_utf8_file('CHANGES.rst'),
+    long_description = read_utf8_file('README.rst') +'\n\n' + read_utf8_file('CHANGES.rst'),
     author='Mikhail Korobov',
     author_email='kmike84@gmail.com',
     url='https://github.com/kmike/DAWG/',
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.