1. Mikhail Korobov
  2. DAWG

Commits

Mikhail Korobov  committed d75c8fc

extra tests & readme improvements

  • Participants
  • Parent commits 7fca224
  • Branches default

Comments (0)

Files changed (2)

File README.rst

View file
     >>> base_dawg.prefixes(u'foobarz')
     [u'foo', u'foobar']
 
+Iterator versions are also available::
+
+    >>> for key in completion_dawg.iterkeys(u'foo'):
+    ...     print(key)
+    foo
+    foobar
+    >>> for prefix in base_dawg.iterprefixes(u'foobarz'):
+    ...     print(prefix)
+    foo
+    foobar
+
 It is possible to find all keys similar to a given key (using a one-way
 char translation table)::
 
     >>> bytes_dawg.get(u'foo', None)
     None
 
-``BytesDAWG`` support ``keys`` and ``items`` methods (they both
-accept optional key prefix). There is also support for
+``BytesDAWG`` support ``keys``, ``items``, ``iterkeys`` and ``iteritems``
+methods (they all accept optional key prefix). There is also support for
 ``similar_keys``, ``similar_items`` and ``similar_item_values`` methods.
 
+.. note::
+
+    Currently the order of keys returned by ``BytesDAWG`` is not the same
+    as the order of keys returned by ``CompletionDAWG`` because
+    of the way ``BytesDAWG`` is implemented: values are internally stored inside
+    DAWG keys after a separator; separator is a chr(255) byte and thus
+    ``'foo'`` key is greater than ``'foobar'`` key (values compared
+    are ``'foo<sep>'`` and ``'foobar<sep>'``).
 
 RecordDAWG
 ----------
   doesn't have this limitation;
 * DAWGs loaded with ``read()`` and unpickled DAWGs uses 3x-4x memory
   compared to DAWGs loaded with ``load()`` method;
+* there are ``keys()`` and ``items()`` methods but no ``values()`` method;
 * iterator versions of methods are not always implemented;
-* there are ``keys()`` and ``items()`` methods but no ``values()`` method.
-* ``prefixes()`` method for getting all prefixes of a given work is
-  not implemented yet;
+* ``BytesDAWG`` and ``RecordDAWG`` key order is different from
+  ``CompletionDAWG`` key order;
 * ``BytesDAWG`` and ``RecordDAWG`` has a limitation: values
   larger than 8KB are unsupported.
 
 Authors & Contributors
 ----------------------
 
-* Mikhail Korobov <kmike84@gmail.com>
+* Mikhail Korobov <kmike84@gmail.com>;
+* Dan Blanchard.
 
 This module is based on `dawgdic`_ C++ library by
 Susumu Yata & contributors.

File tests/test_payload_dawg.py

View file
         ('foobar', b'data4')
     )
 
+    DATA_KEYS = list(zip(*DATA))[0]
+
     def dawg(self):
         return dawg.BytesDAWG(self.DATA)
 
         assert d.prefixes("x") == []
         assert d.prefixes("bar") == ["bar"]
 
+    def test_keys(self):
+        d = self.dawg()
+        assert sorted(d.keys()) == sorted(self.DATA_KEYS)
+
+    def test_iterkeys(self):
+        d = self.dawg()
+        assert list(d.iterkeys()) == d.keys()
+        assert sorted(d.iterkeys()) == sorted(self.DATA_KEYS)
+
+    def test_items(self):
+        d = self.dawg()
+        assert sorted(d.items()) == sorted(self.DATA)
+
+    def test_iteritems(self):
+        d = self.dawg()
+        assert list(d.iteritems()) == d.items()
 
 
 class TestRecordDAWG(object):
         d = self.dawg()
         assert sorted(d.keys()) == ['bar', 'foo', 'foo', 'foobar',]
 
+    def test_record_iterkeys(self):
+        d = self.dawg()
+        assert list(d.iterkeys()) == d.keys()
+
+    def test_record_iteritems(self):
+        d = self.dawg()
+        assert list(d.iteritems()) == d.items()
+
     def test_record_keys_prefix(self):
         d = self.dawg()
         assert sorted(d.keys('fo')) == ['foo', 'foo', 'foobar']