HTTPS SSH

PySkipList - skip lists for CPython, implemented as C Extension.

(c) 2012-2013 - Francesco Romani <fromani . gmail + com>

Description

SkipLists are a probabilistic data structure which provides lookup time in O(logn) time, much like the balanced trees. However, they are much simpler to implement, less memory hungry and more concurrency friendly. The SkipLists are described in full detail here in the original paper freely available. Wikipedia has a quite complete entry too.

This extension provide some new data structures for (C)Python which internally use a SkipLists, by leveraging some C code written by the same author

SkipList

A skiplist.SkipList is an always-ordered data structure which mimics as close as possible the interface of the builtin list and the array.array class. In a nutshell, a skiplist.SkipList is a list-like, iterable, object which is (cheaply) sort()ed after every modification, either an insert() or a remove().

Usage example:

Python 2.7.3rc2 (default, Apr 22 2012, 22:30:17) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import random
>>> import skiplist
>>> s = skiplist.SkipList(cmp, random.random)
>>> for i in range(10):
...     s.insert(chr(ord('a') + i))
... 
>>> len(s)
10
>>> s.tolist()
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
>>> for i in s.rlookup('e'):
...     print "%s " %i,
... 
e  d  c  b  a 
>>> t = skiplist.SkipList(cmp, random.random)
>>> for i in reversed(range(10)):
...     t.insert(i)
... 
>>> print t
skiplist(0, 1, 2... )
>>> t.tolist()
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>>

SkipDict

A skiplist.SkipDict is a dict-like object which always provides the ordering of the inserted keys and the retrieval in O(logn) time. (probabilistic). It is similar to the collections.OrderedDict class, but be very careful: OrderedDict remembers the order of insertion of the keys, while SkipDict always provides the ordering of the keys by leveraging the provided cmp function. Thus, the two classes will behave differently. See the following example.

Usage example:

Python 2.7.3rc2 (default, Apr 22 2012, 22:30:17) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import random
>>> import skiplist
>>> d = skiplist.SkipDict(cmp, random.random)
>>> for i in range(10):
...     d[chr(ord('a') + i)] = i
... 
>>> len(d)
10
>>> d.todict()
{'a': 0, 'c': 2, 'b': 1, 'e': 4, 'd': 3, 'g': 6, 'f': 5, 'i': 8, 'h': 7, 'j': 9}
>>> for k,v in d.rlookup('e'):
...     print "%s:%i" %(k,v),
... 
e:4 d:3 c:2 b:1 a:0
>>> e = skiplist.SkipDict(cmp, random.random)
>>> for i in reversed(range(10)):
...     e[chr(ord('a') + i)] = i
... 
>>> print e
SkipDict('a': 0, 'b': 1, 'c': 2... )
>>> e.todict()
{'a': 0, 'c': 2, 'b': 1, 'e': 4, 'd': 3, 'g': 6, 'f': 5, 'i': 8, 'h': 7, 'j': 9}
>>> import collections
>>> f = collections.OrderedDict()
>>> for i in reversed(range(10)):
...     f[chr(ord('a') + i)] = i
... 
>>> f
OrderedDict([('j', 9), ('i', 8), ('h', 7), ('g', 6), ('f', 5), ('e', 4), ('d', 3), ('c', 2), ('b', 1), ('a', 0)])
>>>

TODO

  • verify the refcounting
  • Python 3 compatibility
  • proper error handling
  • full test coverage
  • better GC integration?

License

Copyright (c) 2012-2013 Francesco Romani <fromani . gmail - com>

This software is provided 'as-is', without any express or implied
warranty. In no event will the authors be held liable for any damages
arising from the use of this software.

Permission is granted to anyone to use this software for any purpose,
including commercial applications, and to alter it and redistribute it
freely, subject to the following restrictions:

  1. The origin of this software must not be misrepresented; you must not
     claim that you wrote the original software. If you use this software
     in a product, an acknowledgment in the product documentation would be
     appreciated but is not required.

  2. Altered source versions must be plainly marked as such, and must not be
     misrepresented as being the original software.

  3. This notice may not be removed or altered from any source
     distribution.

The authors of the code are listed as copyright holders at the beginning of every source module.

EOF