Commits

Gregory Petukhov committed eb33b0f

Refactor repository structure

  • Participants
  • Parent commits ae2e14c

Comments (0)

Files changed (8)

 TrustlinkClient.pm
 trustlink.php
 research.txt
+test_var
 
 syntax:regexp

File phpserialize.py

-# -*- coding: utf-8 -*-
-r"""
-    phpserialize
-    ~~~~~~~~~~~~
-
-    a port of the ``serialize`` and ``unserialize`` functions of
-    php to python.  This module implements the python serialization
-    interface (eg: provides `dumps`, `loads` and similar functions).
-
-    Usage
-    =====
-
-    >>> from phpserialize import *
-    >>> obj = dumps("Hello World")
-    >>> loads(obj)
-    'Hello World'
-
-    Due to the fact that PHP doesn't know the concept of lists, lists
-    are serialized like hash-maps in PHP.  As a matter of fact the
-    reverse value of a serialized list is a dict:
-
-    >>> loads(dumps(range(2)))
-    {0: 0, 1: 1}
-
-    If you want to have a list again, you can use the `dict_to_list`
-    helper function:
-
-    >>> dict_to_list(loads(dumps(range(2))))
-    [0, 1]
-
-    It's also possible to convert into a tuple by using the `dict_to_tuple`
-    function:
-
-    >>> dict_to_tuple(loads(dumps((1, 2, 3))))
-    (1, 2, 3)
-
-    Another problem are unicode strings.  By default unicode strings are
-    encoded to 'utf-8' but not decoded on `unserialize`.  The reason for
-    this is that phpserialize can't guess if you have binary or text data
-    in the strings:
-
-    >>> loads(dumps(u'Hello W\xf6rld'))
-    'Hello W\xc3\xb6rld'
-
-    If you know that you have only text data of a known charset in the result
-    you can decode strings by setting `decode_strings` to True when calling
-    loads:
-
-    >>> loads(dumps(u'Hello W\xf6rld'), decode_strings=True)
-    u'Hello W\xf6rld'
-
-    Dictionary keys are limited to strings and integers.  `None` is converted
-    into an empty string and floats and booleans into integers for PHP
-    compatibility:
-
-    >>> loads(dumps({None: 14, 42.23: 'foo', True: [1, 2, 3]}))
-    {'': 14, 1: {0: 1, 1: 2, 2: 3}, 42: 'foo'}
-
-    It also provides functions to read from file-like objects:
-
-    >>> from StringIO import StringIO
-    >>> stream = StringIO('a:2:{i:0;i:1;i:1;i:2;}')
-    >>> dict_to_list(load(stream))
-    [1, 2]
-
-    And to write to those:
-
-    >>> stream = StringIO()
-    >>> dump([1, 2], stream)
-    >>> stream.getvalue()
-    'a:2:{i:0;i:1;i:1;i:2;}'
-
-    Like `pickle` chaining of objects is supported:
-
-    >>> stream = StringIO()
-    >>> dump([1, 2], stream)
-    >>> dump("foo", stream)
-    >>> stream.seek(0)
-    >>> load(stream)
-    {0: 1, 1: 2}
-    >>> load(stream)
-    'foo'
-
-    This feature however is not supported in PHP.  PHP will only unserialize
-    the first object.
-
-    Array Serialization
-    ===================
-
-    Starting with 1.2 you can provide an array hook to the unserialization
-    functions that are invoked with a list of pairs to return a real array
-    object.  By default `dict` is used as array object which however means
-    that the information about the order is lost for associative arrays.
-
-    For example you can pass the ordered dictionary to the unserilization
-    functions:
-
-    >>> from collections import OrderedDict
-    >>> loads('a:2:{s:3:"foo";i:1;s:3:"bar";i:2;}',
-    ...       array_hook=OrderedDict)
-    collections.OrderedDict([('foo', 1), ('bar', 2)])
-
-    Object Serialization
-    ====================
-
-    PHP supports serialization of objects.  Starting with 1.2 of phpserialize
-    it is possible to both serialize and unserialize objects.  Because class
-    names in PHP and Python usually do not map, there is a separate
-    `object_hook` parameter that is responsible for creating these classes.
-
-    For a simple test example the `phpserialize.phpobject` class can be used:
-
-    >>> data = 'O:7:"WP_User":1:{s:8:"username";s:5:"admin";}'
-    >>> user = loads(data, object_hook=phpobject)
-    >>> user.username
-    'admin'
-    >>> user.__name__
-    'WP_User'
-
-    An object hook is a function that takes the name of the class and a dict
-    with the instance data as arguments.  The instance data keys are in PHP
-    format which usually is not what you want.  To convert it into Python
-    identifiers you can use the `convert_member_dict` function.  For more
-    information about that, have a look at the next section.  Here an
-    example for a simple object hook:
-
-    >>> class User(object):
-    ...     def __init__(self, username):
-    ...         self.username = username
-    ...
-    >>> def object_hook(name, d):
-    ...     cls = {'WP_User': User}[name]
-    ...     return cls(**d)
-    ...
-    >>> user = loads(data, object_hook=object_hook)
-    >>> user.username
-    'admin'
-
-    To serialize objects you can use the `object_hook` of the dump functions
-    and return instances of `phpobject`:
-
-    >>> def object_hook(obj):
-    ...     if isinstance(obj, User):
-    ...         return phpobject('WP_User', {'username': obj.username})
-    ...     raise LookupError('unknown object')
-    ...
-    >>> dumps(user, object_hook=object_hook)
-    'O:7:"WP_User":1:{s:8:"username";s:5:"admin";}'
-
-    PHP's Object System
-    ===================
-
-    The PHP object system is derived from compiled languages such as Java
-    and C#.  Attributes can be protected from external access by setting
-    them to `protected` or `private`.  This does not only serve the purpose
-    to encapsulate internals but also to avoid name clashes.
-
-    In PHP each class in the inheritance chain can have a private variable
-    with the same name, without causing clashes.  (This is similar to the
-    Python `__var` name mangling system).
-
-    This PHP class::
-
-        class WP_UserBase {
-            protected $username;
-
-            public function __construct($username) {
-                $this->username = $username;
-            }
-        }
-
-        class WP_User extends WP_UserBase {
-            private $password;
-            public $flag;
-
-            public function __construct($username, $password) {
-                parent::__construct($username);
-                $this->password = $password;
-                $this->flag = 0;
-            }
-        }
-
-    Is serialized with a member data dict that looks like this:
-
-    >>> data = {
-    ...     ' * username':          'the username',
-    ...     ' WP_User password':    'the password',
-    ...     'flag':                 'the flag'
-    ... }
-
-    Because this access system does not exist in Python, the
-    `convert_member_dict` can convert this dict:
-
-    >>> d = convert_member_dict(data)
-    >>> d['username']
-    'the username'
-    >>> d['password']
-    'the password'
-
-    The `phpobject` class does this conversion on the fly.  What is
-    serialized is the special `__php_vars__` dict of the class:
-
-    >>> user = phpobject('WP_User', data)
-    >>> user.username
-    'the username'
-    >>> user.username = 'admin'
-    >>> user.__php_vars__[' * username']
-    'admin'
-
-    As you can see, reassigning attributes on a php object will try
-    to change a private or protected attribute with the same name.
-    Setting an unknown one will create a new public attribute:
-
-    >>> user.is_admin = True
-    >>> user.__php_vars__['is_admin']
-    True
-
-    To convert the phpobject into a dict, you can use the `_asdict`
-    method:
-
-    >>> d = user._asdict()
-    >>> d['username']
-    'admin'
-
-    Python 3 Notes
-    ==============
-
-    Because the unicode support in Python 3 no longer transparently
-    handles bytes and unicode objects we had to change the way the
-    decoding works.  On Python 3 you most likely want to always
-    decode strings.  Because this would totally fail on binary data
-    phpserialize uses the "surrogateescape" method to not fail on
-    invalid data.  See the documentation in Python 3 for more
-    information.
-
-    Changelog
-    =========
-
-    1.3
-        -   added support for Python 3
-
-    1.2
-        -   added support for object serialization
-        -   added support for array hooks
-
-    1.1
-        -   added `dict_to_list` and `dict_to_tuple`
-        -   added support for unicode
-        -   allowed chaining of objects like pickle does
-
-
-    :copyright: 2007-2012 by Armin Ronacher.
-    license: BSD
-"""
-import codecs
-try:
-    codecs.lookup_error('surrogateescape')
-    default_errors = 'surrogateescape'
-except LookupError:
-    default_errors = 'strict'
-
-try:
-    from StringIO import StringIO as BytesIO
-except ImportError:
-    from io import BytesIO as BytesIO
-
-try:
-    unicode
-except NameError:
-    # Python 3
-    unicode = str
-    basestring = (bytes, str)
-
-try:
-    long
-except NameError:
-    # Python 3
-    long = int
-
-try:
-    xrange
-except NameError:
-    xrange = range
-
-__author__ = 'Armin Ronacher <armin.ronacher@active-4.com>'
-__version__ = '1.3'
-__all__ = ('phpobject', 'convert_member_dict', 'dict_to_list', 'dict_to_tuple',
-           'load', 'loads', 'dump', 'dumps', 'serialize', 'unserialize')
-
-
-def _translate_member_name(name):
-    if name[:1] == ' ':
-        name = name.split(None, 2)[-1]
-    return name
-
-
-class phpobject(object):
-    """Simple representation for PHP objects.  This is used """
-    __slots__ = ('__name__', '__php_vars__')
-
-    def __init__(self, name, d=None):
-        if d is None:
-            d = {}
-        object.__setattr__(self, '__name__', name)
-        object.__setattr__(self, '__php_vars__', d)
-
-    def _asdict(self):
-        """Returns a new dictionary from the data with Python identifiers."""
-        return convert_member_dict(self.__php_vars__)
-
-    def _lookup_php_var(self, name):
-        for key, value in self.__php_vars__.items():
-            if _translate_member_name(key) == name:
-                return key, value
-
-    def __getattr__(self, name):
-        rv = self._lookup_php_var(name)
-        if rv is not None:
-            return rv[1]
-        raise AttributeError(name)
-
-    def __setattr__(self, name, value):
-        rv = self._lookup_php_var(name)
-        if rv is not None:
-            name = rv[0]
-        self.__php_vars__[name] = value
-
-    def __repr__(self):
-        return '<phpobject %r>' % (self.__name__,)
-
-
-def convert_member_dict(d):
-    """Converts the names of a member dict to Python syntax.  PHP class data
-    member names are not the plain identifiers but might be prefixed by the
-    class name if private or a star if protected.  This function converts them
-    into standard Python identifiers:
-
-    >>> convert_member_dict({"username": "user1", " User password":
-    ...                      "default", " * is_active": True})
-    {'username': 'user1', 'password': 'default', 'is_active': True}
-    """
-    return dict((_translate_member_name(k), v) for k, v in d.items())
-
-
-def dumps(data, charset='utf-8', errors=default_errors, object_hook=None):
-    """Return the PHP-serialized representation of the object as a string,
-    instead of writing it to a file like `dump` does.  On Python 3
-    this returns bytes objects, on Python 3 this returns bytestrings.
-    """
-    def _serialize(obj, keypos):
-        if keypos:
-            if isinstance(obj, (int, long, float, bool)):
-                return ('i:%i;' % obj).encode('latin1')
-            if isinstance(obj, basestring):
-                encoded_obj = obj
-                if isinstance(obj, unicode):
-                    encoded_obj = obj.encode(charset, errors)
-                s = BytesIO()
-                s.write(b's:')
-                s.write(str(len(encoded_obj)).encode('latin1'))
-                s.write(b':"')
-                s.write(encoded_obj)
-                s.write(b'";')
-                return s.getvalue()
-            if obj is None:
-                return b's:0:"";'
-            raise TypeError('can\'t serialize %r as key' % type(obj))
-        else:
-            if obj is None:
-                return b'N;'
-            if isinstance(obj, bool):
-                return ('b:%i;' % obj).encode('latin1')
-            if isinstance(obj, (int, long)):
-                return ('i:%s;' % obj).encode('latin1')
-            if isinstance(obj, float):
-                return ('d:%s;' % obj).encode('latin1')
-            if isinstance(obj, basestring):
-                encoded_obj = obj
-                if isinstance(obj, unicode):
-                    encoded_obj = obj.encode(charset, errors)
-                s = BytesIO()
-                s.write(b's:')
-                s.write(str(len(encoded_obj)).encode('latin1'))
-                s.write(b':"')
-                s.write(encoded_obj)
-                s.write(b'";')
-                return s.getvalue()
-            if isinstance(obj, (list, tuple, dict)):
-                out = []
-                if isinstance(obj, dict):
-                    iterable = obj.items()
-                else:
-                    iterable = enumerate(obj)
-                for key, value in iterable:
-                    out.append(_serialize(key, True))
-                    out.append(_serialize(value, False))
-                return b''.join([
-                    b'a:',
-                    str(len(obj)).encode('latin1'),
-                    b':{',
-                    b''.join(out),
-                    b'}'
-                ])
-            if isinstance(obj, phpobject):
-                return b'O' + _serialize(obj.__name__, True)[1:-1] + \
-                       _serialize(obj.__php_vars__, False)[1:]
-            if object_hook is not None:
-                return _serialize(object_hook(obj), False)
-            raise TypeError('can\'t serialize %r' % type(obj))
-
-    return _serialize(data, False)
-
-
-def load(fp, charset='utf-8', errors=default_errors, decode_strings=False,
-         object_hook=None, array_hook=None):
-    """Read a string from the open file object `fp` and interpret it as a
-    data stream of PHP-serialized objects, reconstructing and returning
-    the original object hierarchy.
-
-    `fp` must provide a `read()` method that takes an integer argument.  Both
-    method should return strings.  Thus `fp` can be a file object opened for
-    reading, a `StringIO` object (`BytesIO` on Python 3), or any other custom
-    object that meets this interface.
-
-    `load` will read exactly one object from the stream.  See the docstring of
-    the module for this chained behavior.
-
-    If an object hook is given object-opcodes are supported in the serilization
-    format.  The function is called with the class name and a dict of the
-    class data members.  The data member names are in PHP format which is
-    usually not what you want.  The `simple_object_hook` function can convert
-    them to Python identifier names.
-
-    If an `array_hook` is given that function is called with a list of pairs
-    for all array items.  This can for example be set to
-    `collections.OrderedDict` for an ordered, hashed dictionary.
-    """
-    if array_hook is None:
-        array_hook = dict
-
-    def _expect(e):
-        v = fp.read(len(e))
-        if v != e:
-            raise ValueError('failed expectation, expected %r got %r' % (e, v))
-
-    def _read_until(delim):
-        buf = []
-        while 1:
-            char = fp.read(1)
-            if char == delim:
-                break
-            elif not char:
-                raise ValueError('unexpected end of stream')
-            buf.append(char)
-        return b''.join(buf)
-
-    def _load_array():
-        items = int(_read_until(b':')) * 2
-        _expect(b'{')
-        result = []
-        last_item = Ellipsis
-        for idx in xrange(items):
-            item = _unserialize()
-            if last_item is Ellipsis:
-                last_item = item
-            else:
-                result.append((last_item, item))
-                last_item = Ellipsis
-        _expect(b'}')
-        return result
-
-    def _unserialize():
-        type_ = fp.read(1).lower()
-        if type_ == b'n':
-            _expect(b';')
-            return None
-        if type_ in b'idb':
-            _expect(b':')
-            data = _read_until(b';')
-            if type_ == b'i':
-                return int(data)
-            if type_ == b'd':
-                return float(data)
-            return int(data) != 0
-        if type_ == b's':
-            _expect(b':')
-            length = int(_read_until(b':'))
-            _expect(b'"')
-            data = fp.read(length)
-            _expect(b'"')
-            if decode_strings:
-                data = data.decode(charset, errors)
-            _expect(b';')
-            return data
-        if type_ == b'a':
-            _expect(b':')
-            return array_hook(_load_array())
-        if type_ == b'o':
-            if object_hook is None:
-                raise ValueError('object in serialization dump but '
-                                 'object_hook not given.')
-            _expect(b':')
-            name_length = int(_read_until(b':'))
-            _expect(b'"')
-            name = fp.read(name_length)
-            _expect(b'":')
-            if decode_strings:
-                name = name.decode(charset, errors)
-            return object_hook(name, dict(_load_array()))
-        raise ValueError('unexpected opcode')
-
-    return _unserialize()
-
-
-def loads(data, charset='utf-8', errors=default_errors, decode_strings=False,
-          object_hook=None, array_hook=None):
-    """Read a PHP-serialized object hierarchy from a string.  Characters in the
-    string past the object's representation are ignored.  On Python 3 the
-    string must be a bytestring.
-    """
-    return load(BytesIO(data), charset, errors, decode_strings,
-                object_hook, array_hook)
-
-
-def dump(data, fp, charset='utf-8', errors=default_errors, object_hook=None):
-    """Write a PHP-serialized representation of obj to the open file object
-    `fp`.  Unicode strings are encoded to `charset` with the error handling
-    of `errors`.
-
-    `fp` must have a `write()` method that accepts a single string argument.
-    It can thus be a file object opened for writing, a `StringIO` object
-    (or a `BytesIO` object on Python 3), or any other custom object that meets
-    this interface.
-
-    The `object_hook` is called for each unknown object and has to either
-    raise an exception if it's unable to convert the object or return a
-    value that is serializable (such as a `phpobject`).
-    """
-    fp.write(dumps(data, charset, errors, object_hook))
-
-
-def dict_to_list(d):
-    """Converts an ordered dict into a list."""
-    # make sure it's a dict, that way dict_to_list can be used as an
-    # array_hook.
-    d = dict(d)
-    try:
-        return [d[x] for x in xrange(len(d))]
-    except KeyError:
-        raise ValueError('dict is not a sequence')
-
-
-def dict_to_tuple(d):
-    """Converts an ordered dict into a tuple."""
-    return tuple(dict_to_list(d))
-
-
-serialize = dumps
-unserialize = loads
+#!/usr/bin/env python
 from unittest import TestCase
 import unittest
+import os.path
+import logging
 
-from trustlink import TrustlinkClient, TrustlinkDownloader
+from trustlink import phpserialize
+from trustlink.client import TrustlinkClient, TrustlinkDownloader
 
-class TrustlinkClient(TestCase):
+ROOT = os.path.dirname(os.path.realpath(__file__))
+DATA_DIR = os.path.join(ROOT, 'test_data')
+VAR_DIR = os.path.join(ROOT, 'test_var')
+
+class TrustlinkClientTest(TestCase):
     def test_init(self):
         user_id = '23423423'
         url = 'http://domain.com/foo?bar=baz'
-        tr = Trustlink(user_id, url)
+        tr = TrustlinkClient(user_id, url, VAR_DIR)
         self.assertEqual('domain.com', tr.host)
         self.assertEqual(url, tr.url)
 
-        tr = Trustlink(user_id, url, static=True)
+        tr = TrustlinkClient(user_id, url, VAR_DIR, static=True)
         self.assertEqual('domain.com', tr.host)
         self.assertEqual('/foo', tr.url)
         
-        tr = Trustlink(user_id, url)
-        tr.isrobot = False
-        tr.test = False
-        tr.verbose = False
-        tr.force_show_code = False
+        tr = TrustlinkClient(user_id, url, VAR_DIR)
+        self.assertEqual(tr.isrobot, False)
+        self.assertEqual(tr.test, False)
+        self.assertEqual(tr.verbose, False)
+        self.assertEqual(tr.force_show_code, False)
 
-        tr = Trustlink(user_id, url, headers={'Trustlink': user_id})
-        tr.isrobot = True
-        tr.test = True
-        tr.verbose = True
-        tr.force_show_code = False
+        tr = TrustlinkClient(user_id, url, VAR_DIR, request_headers={'Trustlink': user_id})
+        self.assertEqual(tr.isrobot, True)
+        self.assertEqual(tr.test, True)
+        self.assertEqual(tr.verbose, True)
+        self.assertEqual(tr.force_show_code, False)
 
-        tr = Trustlink(user_id, url, get={'trustlink_test': user_id})
-        tr.isrobot = False
-        tr.test = False
-        tr.verbose = True
-        tr.force_show_code = True
+        tr = TrustlinkClient(user_id, url, VAR_DIR, request_params={'trustlink_test': user_id})
+        self.assertEqual(tr.isrobot, False)
+        self.assertEqual(tr.test, False)
+        self.assertEqual(tr.verbose, True)
+        self.assertEqual(tr.force_show_code, True)
+
+
+class TrustlinkDownloaderTest(TestCase):
+    def test_download(self):
+        user_id = '23423423'
+        host = 'example.com'
+        example_data = open(os.path.join(DATA_DIR, 'example_response.txt')).read()
+        tr = TrustlinkDownloader(user_id, host, data_dir=VAR_DIR)
+        tr.network_request = lambda x: example_data
+        tr.download()
+
+
+class TrustlinkCompleteTest(TestCase):
+    def test_download(self):
+        user_id = '23423423'
+        host = 'test.com'
+        example_data = open(os.path.join(DATA_DIR, 'example_response.txt')).read()
+        tr = TrustlinkDownloader(user_id, host, data_dir=VAR_DIR)
+        tr.network_request = lambda x: example_data
+        tr.download()
+
+        mapping = phpserialize.loads(example_data)
+        tl_ip = mapping['__trustlink_robots__'][0]
+
+        tr = TrustlinkClient(user_id, 'http://test.com', VAR_DIR,
+                             remote_ip=tl_ip)
+        resp = tr.build_links()
+        print resp 
+        self.assertTrue(len(resp['prepend']) > 0)
+        self.assertTrue('<!--' in resp['prepend'])
+        self.assertFalse('<noindex>' in resp['prepend'])
+
+        tr = TrustlinkClient(user_id, 'http://test.com', VAR_DIR,
+                             remote_ip='9.8.7.6')
+        resp = tr.build_links()
+        print resp 
+        self.assertTrue(len(resp['prepend']) == 0)
+        self.assertFalse('<!--' in resp['prepend'])
+
+        tr = TrustlinkClient(user_id, 'http://test.com', VAR_DIR,
+                             remote_ip='9.8.7.6',
+                             request_headers={'Trustlink': user_id})
+        resp = tr.build_links()
+        print resp 
+        self.assertTrue(len(resp['prepend']) > 0)
+        self.assertTrue('<noindex>' in resp['prepend'])
 
 
 if __name__ == '__main__':
+    logging.basicConfig(level=logging.DEBUG)
     unittest.main()

File test_data/example_response.txt

+a:7:{s:16:"__test_tl_link__";a:3:{s:3:"url";s:15:"http://test.com";s:4:"text";s:93:"Побочный PR-эффект ускоряет направленный маркетинг";s:6:"anchor";s:51:"Стратегия позиционирования";}s:17:"__trustlink_end__";s:15:"<!--9c7ec26b-->";s:19:"__trustlink_start__";s:15:"<!--9c7ec26b-->";s:20:"__trustlink_robots__";a:12:{i:0;s:11:"69.31.80.90";i:1;s:11:"69.31.80.91";i:2;s:11:"69.31.80.92";i:3;s:11:"69.31.80.93";i:4;s:11:"69.31.80.94";i:5;s:14:"63.218.226.114";i:6;s:15:"213.174.145.113";i:7;s:12:"46.229.166.0";i:8;s:12:"46.229.166.1";i:9;s:12:"46.229.166.2";i:10;s:12:"46.229.166.3";i:11;s:12:"46.229.166.4";}s:23:"__trustlink_delimiter__";s:1:";";s:24:"__trustlink_after_text__";N;s:25:"__trustlink_before_text__";N;}

File trustlink.py

-from urllib import unquote_plus
-from urlparse import urlsplit
-import shutil
-import phpserialize
-import logging
-
-class TrustlinkClient(object):
-    def __init__(self, trustlink_user, url, static=False, multi_site=False,
-                 verbose=False, charset='utf-8', force_show_code=False,
-                 headers=None, get=None):
-        url_info = urlsplit(url)
-        host_prepared = url_info.hostname.lower()
-        self.static = static
-
-        if host_prepared.startswith('www.'):
-            self.host = host_prepared[4:]
-        else:
-            self.host = host_prepared
-
-        if self.static:
-            url_prepared = url_info.path
-        else:
-            url_prepared = url
-
-        self.url = unquote_plus(url_prepared)
-        self.multi_site = multi_site
-        self.verbose = verbose
-
-        # ???
-        #if self.tl_links['__trustlink_debug__']:
-            #self.verbose = True
-
-        self.charset = charset
-        self.force_show_code = force_show_code
-
-        # ???
-        #if self.tl_links['__trustlink_debug__']:
-            #self.force_show_code = True
-
-        self.trustlink_user = trustlink_user
-        self.isrobot = False
-        self.test = False
-
-        if headers and headers.get('Trustlink') == self.trustlink_user:
-            self.test = True
-            self.isrobot = True
-            self.verbose = True
-
-        if get and get.get('trustlink_test') == self.trustlink_user:
-            self.force_show_code = True
-            self.verbose = True
-
-    def build_links(self):
-        links = self.load_page_links()
-        prepend = ''
-        append = ''
-
-        start = self.db.get('__trustlink_start__', None)
-        if start:
-            # TODO: check remote_ip & self.db['__trustlink_robots__']
-            prepend += start
-
-        items = []
-        for link in links:
-            item = {'anchor': link['anchor'], 'text': link['text']}
-
-            if link['punicode_url']:
-                item['url'] = link['punicode_url']
-            else:
-                item['url'] = link['url']
-
-            link_info = urlsplit(link)
-            host = link_info.hostname
-            if host.startswith('www.'):
-                host_prepared = host[4:]
-            else:
-                host_preapred = host
-            item['host'] = host_prepared
-            items.append(item)
-
-        start = self.db.get('__trustlink_end', None)
-        if end:
-            # TODO: check remote_ip & self.db['__trustlink_robots__']
-            append += end
-
-        #TODO
-        #if ($this->tl_test && !$this->tl_isrobot)
-            #$result = '<noindex>'.$result.'</noindex>';
-        #return $result;
-        return {'prepend': prepend, 'items': items, 'append': append}
-
-
-class TrustlinkDownloader(object):
-    def __init__(self, trustlink_user, host, data_dir):
-        if host.startswith('www.'):
-            self.host = host[4:]
-        else:
-            self.host = host
-        self.truslink_user = trustlink_user
-
-    def download(self):
-        path = os.path.join(data_dir, 'trustlink.%s.db' % self.host)
-        tmp_path = os.path.join(data_dir, 'trustlink.%s.tmp' % self.host)
-        db = anydbm.open(tmp_path, 'n')
-
-        tl_url = 'http://db.trustlink.ru/%s/%s/UTF-8' % (self.trustlink_user,
-                                                         self.host)
-        logging.debug('Downloading %s' % tl_url)
-        data = urllib.urlopen(tl_url).read()
-        if data.startswith('FATAL ERROR:'):
-            raise Exception('Trustlink fatal error: %s' % data)
-        logging.debug('Downloaded %d bytes' % len(data))
-
-        mapping = phpserize.loads(data)
-        logging.debug('Found %d items in unserialized data' % len(mapping))
-        
-        logging.debug('Writing data dbm file: %s' % tmp_path)
-        for key, value in mapping.items():
-            db[key] = value
-        db.close()
-
-        logging.debug('Copying %s to %s' % (tmp_path, path))
-        shutil.copy(tmp_path, path)

File trustlink/__init__.py

Empty file added.

File trustlink/client.py

+from urllib import unquote_plus
+from urlparse import urlsplit
+import shutil
+import phpserialize
+import logging
+import os.path
+import anydbm
+import json
+
+SPECIAL_KEYS = ('__test_tl_link__', '__trustlink_end__', '__trustlink__start__',
+                '__trustlink_robots__', '__trustlink_delimiter__',
+                '__trustlink__after_text__', '__trustlink_before_text__')
+
+
+class TrustlinkClient(object):
+    def __init__(self, trustlink_user, url, data_dir,
+                 remote_ip=None,
+                 static=False, multi_site=False,
+                 verbose=False, charset='utf-8', force_show_code=False,
+                 request_headers=None, request_params=None):
+        url_info = urlsplit(url)
+        host_prepared = url_info.hostname.lower()
+        self.static = static
+        self.data_dir = data_dir
+        self.remote_ip = remote_ip
+
+        if host_prepared.startswith('www.'):
+            self.host = host_prepared[4:]
+        else:
+            self.host = host_prepared
+
+        if self.static:
+            url_prepared = url_info.path
+        else:
+            url_prepared = url
+
+        self.url = unquote_plus(url_prepared)
+        self.multi_site = multi_site
+        self.verbose = verbose
+
+        # ???
+        #if self.tl_links['__trustlink_debug__']:
+            #self.verbose = True
+
+        self.charset = charset
+        self.force_show_code = force_show_code
+
+        # ???
+        #if self.tl_links['__trustlink_debug__']:
+            #self.force_show_code = True
+
+        self.trustlink_user = trustlink_user
+        self.isrobot = False
+        self.test = False
+
+        if request_headers and request_headers.get('Trustlink') == self.trustlink_user:
+            self.test = True
+            self.isrobot = True
+            self.verbose = True
+
+        if request_params and request_params.get('trustlink_test') == self.trustlink_user:
+            self.force_show_code = True
+            self.verbose = True
+
+    def build_links(self):
+        db = self.init_database_handler()
+        prepend = ''
+        append = ''
+
+        start = db.get('__trustlink_start__', None)
+        if start:
+            if self.remote_ip in db.get('__trustlink_robots__', []):
+                prepend += start
+
+
+        if self.test:
+            links = [
+                json.loads(db.get('__test_tl_link__', [])),
+            ]
+        else:
+            links = [json.loads(x) for x in db.get(self.url, [])]
+
+        items = []
+        for link in links:
+            item = {'anchor': link['anchor'], 'text': link['text']}
+
+            if link.get('punicode_url'):
+                item['url'] = link['punicode_url']
+            else:
+                item['url'] = link['url']
+
+            link_info = urlsplit(link['url'])
+            host = link_info.hostname
+            if host.startswith('www.'):
+                host = host[4:]
+            item['host'] = host
+            items.append(item)
+
+        end = db.get('__trustlink_end__', None)
+        if end:
+            if self.remote_ip in db.get('__trustlink_robots__', []):
+                append += end
+
+        if self.test:
+            prepend = '<noindex>' + prepend
+            append = append + '</noindex>'
+
+        return {'prepend': prepend, 'items': items, 'append': append}
+
+    def init_database_handler(self):
+        path = os.path.join(self.data_dir, 'trustlink.%s.db' % self.host)
+        db = anydbm.open(path)
+        return db
+
+
+class TrustlinkDownloader(object):
+    def __init__(self, trustlink_user, host, data_dir):
+        if host.startswith('www.'):
+            self.host = host[4:]
+        else:
+            self.host = host
+        self.trustlink_user = trustlink_user
+        self.data_dir = data_dir
+
+    def download(self):
+        path = os.path.join(self.data_dir, 'trustlink.%s.db' % self.host)
+        tmp_path = os.path.join(self.data_dir, 'trustlink.%s.tmp' % self.host)
+
+        tl_url = 'http://db.trustlink.ru/%s/%s/UTF-8' % (self.trustlink_user,
+                                                         self.host)
+        logging.debug('Downloading %s' % tl_url)
+        data = self.network_request(tl_url)
+        if data.startswith('FATAL ERROR:'):
+            raise Exception('Trustlink fatal error: %s' % data)
+        logging.debug('Downloaded %d bytes' % len(data))
+
+        mapping = phpserialize.loads(data)
+        logging.debug('Found %d items in unserialized data' % len(mapping))
+        
+        logging.debug('Writing data dbm file: %s' % tmp_path)
+        db = anydbm.open(tmp_path, 'n')
+        for key, value in mapping.items():
+            db[key] = json.dumps(value)
+        db.close()
+
+        logging.debug('Copying %s to %s' % (tmp_path, path))
+        shutil.copy(tmp_path, path)
+
+    def network_request(self, url):
+        return urllib.urlopen(tl_url).read()

File trustlink/phpserialize.py

+# -*- coding: utf-8 -*-
+r"""
+    phpserialize
+    ~~~~~~~~~~~~
+
+    a port of the ``serialize`` and ``unserialize`` functions of
+    php to python.  This module implements the python serialization
+    interface (eg: provides `dumps`, `loads` and similar functions).
+
+    Usage
+    =====
+
+    >>> from phpserialize import *
+    >>> obj = dumps("Hello World")
+    >>> loads(obj)
+    'Hello World'
+
+    Due to the fact that PHP doesn't know the concept of lists, lists
+    are serialized like hash-maps in PHP.  As a matter of fact the
+    reverse value of a serialized list is a dict:
+
+    >>> loads(dumps(range(2)))
+    {0: 0, 1: 1}
+
+    If you want to have a list again, you can use the `dict_to_list`
+    helper function:
+
+    >>> dict_to_list(loads(dumps(range(2))))
+    [0, 1]
+
+    It's also possible to convert into a tuple by using the `dict_to_tuple`
+    function:
+
+    >>> dict_to_tuple(loads(dumps((1, 2, 3))))
+    (1, 2, 3)
+
+    Another problem are unicode strings.  By default unicode strings are
+    encoded to 'utf-8' but not decoded on `unserialize`.  The reason for
+    this is that phpserialize can't guess if you have binary or text data
+    in the strings:
+
+    >>> loads(dumps(u'Hello W\xf6rld'))
+    'Hello W\xc3\xb6rld'
+
+    If you know that you have only text data of a known charset in the result
+    you can decode strings by setting `decode_strings` to True when calling
+    loads:
+
+    >>> loads(dumps(u'Hello W\xf6rld'), decode_strings=True)
+    u'Hello W\xf6rld'
+
+    Dictionary keys are limited to strings and integers.  `None` is converted
+    into an empty string and floats and booleans into integers for PHP
+    compatibility:
+
+    >>> loads(dumps({None: 14, 42.23: 'foo', True: [1, 2, 3]}))
+    {'': 14, 1: {0: 1, 1: 2, 2: 3}, 42: 'foo'}
+
+    It also provides functions to read from file-like objects:
+
+    >>> from StringIO import StringIO
+    >>> stream = StringIO('a:2:{i:0;i:1;i:1;i:2;}')
+    >>> dict_to_list(load(stream))
+    [1, 2]
+
+    And to write to those:
+
+    >>> stream = StringIO()
+    >>> dump([1, 2], stream)
+    >>> stream.getvalue()
+    'a:2:{i:0;i:1;i:1;i:2;}'
+
+    Like `pickle` chaining of objects is supported:
+
+    >>> stream = StringIO()
+    >>> dump([1, 2], stream)
+    >>> dump("foo", stream)
+    >>> stream.seek(0)
+    >>> load(stream)
+    {0: 1, 1: 2}
+    >>> load(stream)
+    'foo'
+
+    This feature however is not supported in PHP.  PHP will only unserialize
+    the first object.
+
+    Array Serialization
+    ===================
+
+    Starting with 1.2 you can provide an array hook to the unserialization
+    functions that are invoked with a list of pairs to return a real array
+    object.  By default `dict` is used as array object which however means
+    that the information about the order is lost for associative arrays.
+
+    For example you can pass the ordered dictionary to the unserilization
+    functions:
+
+    >>> from collections import OrderedDict
+    >>> loads('a:2:{s:3:"foo";i:1;s:3:"bar";i:2;}',
+    ...       array_hook=OrderedDict)
+    collections.OrderedDict([('foo', 1), ('bar', 2)])
+
+    Object Serialization
+    ====================
+
+    PHP supports serialization of objects.  Starting with 1.2 of phpserialize
+    it is possible to both serialize and unserialize objects.  Because class
+    names in PHP and Python usually do not map, there is a separate
+    `object_hook` parameter that is responsible for creating these classes.
+
+    For a simple test example the `phpserialize.phpobject` class can be used:
+
+    >>> data = 'O:7:"WP_User":1:{s:8:"username";s:5:"admin";}'
+    >>> user = loads(data, object_hook=phpobject)
+    >>> user.username
+    'admin'
+    >>> user.__name__
+    'WP_User'
+
+    An object hook is a function that takes the name of the class and a dict
+    with the instance data as arguments.  The instance data keys are in PHP
+    format which usually is not what you want.  To convert it into Python
+    identifiers you can use the `convert_member_dict` function.  For more
+    information about that, have a look at the next section.  Here an
+    example for a simple object hook:
+
+    >>> class User(object):
+    ...     def __init__(self, username):
+    ...         self.username = username
+    ...
+    >>> def object_hook(name, d):
+    ...     cls = {'WP_User': User}[name]
+    ...     return cls(**d)
+    ...
+    >>> user = loads(data, object_hook=object_hook)
+    >>> user.username
+    'admin'
+
+    To serialize objects you can use the `object_hook` of the dump functions
+    and return instances of `phpobject`:
+
+    >>> def object_hook(obj):
+    ...     if isinstance(obj, User):
+    ...         return phpobject('WP_User', {'username': obj.username})
+    ...     raise LookupError('unknown object')
+    ...
+    >>> dumps(user, object_hook=object_hook)
+    'O:7:"WP_User":1:{s:8:"username";s:5:"admin";}'
+
+    PHP's Object System
+    ===================
+
+    The PHP object system is derived from compiled languages such as Java
+    and C#.  Attributes can be protected from external access by setting
+    them to `protected` or `private`.  This does not only serve the purpose
+    to encapsulate internals but also to avoid name clashes.
+
+    In PHP each class in the inheritance chain can have a private variable
+    with the same name, without causing clashes.  (This is similar to the
+    Python `__var` name mangling system).
+
+    This PHP class::
+
+        class WP_UserBase {
+            protected $username;
+
+            public function __construct($username) {
+                $this->username = $username;
+            }
+        }
+
+        class WP_User extends WP_UserBase {
+            private $password;
+            public $flag;
+
+            public function __construct($username, $password) {
+                parent::__construct($username);
+                $this->password = $password;
+                $this->flag = 0;
+            }
+        }
+
+    Is serialized with a member data dict that looks like this:
+
+    >>> data = {
+    ...     ' * username':          'the username',
+    ...     ' WP_User password':    'the password',
+    ...     'flag':                 'the flag'
+    ... }
+
+    Because this access system does not exist in Python, the
+    `convert_member_dict` can convert this dict:
+
+    >>> d = convert_member_dict(data)
+    >>> d['username']
+    'the username'
+    >>> d['password']
+    'the password'
+
+    The `phpobject` class does this conversion on the fly.  What is
+    serialized is the special `__php_vars__` dict of the class:
+
+    >>> user = phpobject('WP_User', data)
+    >>> user.username
+    'the username'
+    >>> user.username = 'admin'
+    >>> user.__php_vars__[' * username']
+    'admin'
+
+    As you can see, reassigning attributes on a php object will try
+    to change a private or protected attribute with the same name.
+    Setting an unknown one will create a new public attribute:
+
+    >>> user.is_admin = True
+    >>> user.__php_vars__['is_admin']
+    True
+
+    To convert the phpobject into a dict, you can use the `_asdict`
+    method:
+
+    >>> d = user._asdict()
+    >>> d['username']
+    'admin'
+
+    Python 3 Notes
+    ==============
+
+    Because the unicode support in Python 3 no longer transparently
+    handles bytes and unicode objects we had to change the way the
+    decoding works.  On Python 3 you most likely want to always
+    decode strings.  Because this would totally fail on binary data
+    phpserialize uses the "surrogateescape" method to not fail on
+    invalid data.  See the documentation in Python 3 for more
+    information.
+
+    Changelog
+    =========
+
+    1.3
+        -   added support for Python 3
+
+    1.2
+        -   added support for object serialization
+        -   added support for array hooks
+
+    1.1
+        -   added `dict_to_list` and `dict_to_tuple`
+        -   added support for unicode
+        -   allowed chaining of objects like pickle does
+
+
+    :copyright: 2007-2012 by Armin Ronacher.
+    license: BSD
+"""
+import codecs
+try:
+    codecs.lookup_error('surrogateescape')
+    default_errors = 'surrogateescape'
+except LookupError:
+    default_errors = 'strict'
+
+try:
+    from StringIO import StringIO as BytesIO
+except ImportError:
+    from io import BytesIO as BytesIO
+
+try:
+    unicode
+except NameError:
+    # Python 3
+    unicode = str
+    basestring = (bytes, str)
+
+try:
+    long
+except NameError:
+    # Python 3
+    long = int
+
+try:
+    xrange
+except NameError:
+    xrange = range
+
+__author__ = 'Armin Ronacher <armin.ronacher@active-4.com>'
+__version__ = '1.3'
+__all__ = ('phpobject', 'convert_member_dict', 'dict_to_list', 'dict_to_tuple',
+           'load', 'loads', 'dump', 'dumps', 'serialize', 'unserialize')
+
+
+def _translate_member_name(name):
+    if name[:1] == ' ':
+        name = name.split(None, 2)[-1]
+    return name
+
+
+class phpobject(object):
+    """Simple representation for PHP objects.  This is used """
+    __slots__ = ('__name__', '__php_vars__')
+
+    def __init__(self, name, d=None):
+        if d is None:
+            d = {}
+        object.__setattr__(self, '__name__', name)
+        object.__setattr__(self, '__php_vars__', d)
+
+    def _asdict(self):
+        """Returns a new dictionary from the data with Python identifiers."""
+        return convert_member_dict(self.__php_vars__)
+
+    def _lookup_php_var(self, name):
+        for key, value in self.__php_vars__.items():
+            if _translate_member_name(key) == name:
+                return key, value
+
+    def __getattr__(self, name):
+        rv = self._lookup_php_var(name)
+        if rv is not None:
+            return rv[1]
+        raise AttributeError(name)
+
+    def __setattr__(self, name, value):
+        rv = self._lookup_php_var(name)
+        if rv is not None:
+            name = rv[0]
+        self.__php_vars__[name] = value
+
+    def __repr__(self):
+        return '<phpobject %r>' % (self.__name__,)
+
+
+def convert_member_dict(d):
+    """Converts the names of a member dict to Python syntax.  PHP class data
+    member names are not the plain identifiers but might be prefixed by the
+    class name if private or a star if protected.  This function converts them
+    into standard Python identifiers:
+
+    >>> convert_member_dict({"username": "user1", " User password":
+    ...                      "default", " * is_active": True})
+    {'username': 'user1', 'password': 'default', 'is_active': True}
+    """
+    return dict((_translate_member_name(k), v) for k, v in d.items())
+
+
+def dumps(data, charset='utf-8', errors=default_errors, object_hook=None):
+    """Return the PHP-serialized representation of the object as a string,
+    instead of writing it to a file like `dump` does.  On Python 3
+    this returns bytes objects, on Python 3 this returns bytestrings.
+    """
+    def _serialize(obj, keypos):
+        if keypos:
+            if isinstance(obj, (int, long, float, bool)):
+                return ('i:%i;' % obj).encode('latin1')
+            if isinstance(obj, basestring):
+                encoded_obj = obj
+                if isinstance(obj, unicode):
+                    encoded_obj = obj.encode(charset, errors)
+                s = BytesIO()
+                s.write(b's:')
+                s.write(str(len(encoded_obj)).encode('latin1'))
+                s.write(b':"')
+                s.write(encoded_obj)
+                s.write(b'";')
+                return s.getvalue()
+            if obj is None:
+                return b's:0:"";'
+            raise TypeError('can\'t serialize %r as key' % type(obj))
+        else:
+            if obj is None:
+                return b'N;'
+            if isinstance(obj, bool):
+                return ('b:%i;' % obj).encode('latin1')
+            if isinstance(obj, (int, long)):
+                return ('i:%s;' % obj).encode('latin1')
+            if isinstance(obj, float):
+                return ('d:%s;' % obj).encode('latin1')
+            if isinstance(obj, basestring):
+                encoded_obj = obj
+                if isinstance(obj, unicode):
+                    encoded_obj = obj.encode(charset, errors)
+                s = BytesIO()
+                s.write(b's:')
+                s.write(str(len(encoded_obj)).encode('latin1'))
+                s.write(b':"')
+                s.write(encoded_obj)
+                s.write(b'";')
+                return s.getvalue()
+            if isinstance(obj, (list, tuple, dict)):
+                out = []
+                if isinstance(obj, dict):
+                    iterable = obj.items()
+                else:
+                    iterable = enumerate(obj)
+                for key, value in iterable:
+                    out.append(_serialize(key, True))
+                    out.append(_serialize(value, False))
+                return b''.join([
+                    b'a:',
+                    str(len(obj)).encode('latin1'),
+                    b':{',
+                    b''.join(out),
+                    b'}'
+                ])
+            if isinstance(obj, phpobject):
+                return b'O' + _serialize(obj.__name__, True)[1:-1] + \
+                       _serialize(obj.__php_vars__, False)[1:]
+            if object_hook is not None:
+                return _serialize(object_hook(obj), False)
+            raise TypeError('can\'t serialize %r' % type(obj))
+
+    return _serialize(data, False)
+
+
+def load(fp, charset='utf-8', errors=default_errors, decode_strings=False,
+         object_hook=None, array_hook=None):
+    """Read a string from the open file object `fp` and interpret it as a
+    data stream of PHP-serialized objects, reconstructing and returning
+    the original object hierarchy.
+
+    `fp` must provide a `read()` method that takes an integer argument.  Both
+    method should return strings.  Thus `fp` can be a file object opened for
+    reading, a `StringIO` object (`BytesIO` on Python 3), or any other custom
+    object that meets this interface.
+
+    `load` will read exactly one object from the stream.  See the docstring of
+    the module for this chained behavior.
+
+    If an object hook is given object-opcodes are supported in the serilization
+    format.  The function is called with the class name and a dict of the
+    class data members.  The data member names are in PHP format which is
+    usually not what you want.  The `simple_object_hook` function can convert
+    them to Python identifier names.
+
+    If an `array_hook` is given that function is called with a list of pairs
+    for all array items.  This can for example be set to
+    `collections.OrderedDict` for an ordered, hashed dictionary.
+    """
+    if array_hook is None:
+        array_hook = dict
+
+    def _expect(e):
+        v = fp.read(len(e))
+        if v != e:
+            raise ValueError('failed expectation, expected %r got %r' % (e, v))
+
+    def _read_until(delim):
+        buf = []
+        while 1:
+            char = fp.read(1)
+            if char == delim:
+                break
+            elif not char:
+                raise ValueError('unexpected end of stream')
+            buf.append(char)
+        return b''.join(buf)
+
+    def _load_array():
+        items = int(_read_until(b':')) * 2
+        _expect(b'{')
+        result = []
+        last_item = Ellipsis
+        for idx in xrange(items):
+            item = _unserialize()
+            if last_item is Ellipsis:
+                last_item = item
+            else:
+                result.append((last_item, item))
+                last_item = Ellipsis
+        _expect(b'}')
+        return result
+
+    def _unserialize():
+        type_ = fp.read(1).lower()
+        if type_ == b'n':
+            _expect(b';')
+            return None
+        if type_ in b'idb':
+            _expect(b':')
+            data = _read_until(b';')
+            if type_ == b'i':
+                return int(data)
+            if type_ == b'd':
+                return float(data)
+            return int(data) != 0
+        if type_ == b's':
+            _expect(b':')
+            length = int(_read_until(b':'))
+            _expect(b'"')
+            data = fp.read(length)
+            _expect(b'"')
+            if decode_strings:
+                data = data.decode(charset, errors)
+            _expect(b';')
+            return data
+        if type_ == b'a':
+            _expect(b':')
+            return array_hook(_load_array())
+        if type_ == b'o':
+            if object_hook is None:
+                raise ValueError('object in serialization dump but '
+                                 'object_hook not given.')
+            _expect(b':')
+            name_length = int(_read_until(b':'))
+            _expect(b'"')
+            name = fp.read(name_length)
+            _expect(b'":')
+            if decode_strings:
+                name = name.decode(charset, errors)
+            return object_hook(name, dict(_load_array()))
+        raise ValueError('unexpected opcode')
+
+    return _unserialize()
+
+
+def loads(data, charset='utf-8', errors=default_errors, decode_strings=False,
+          object_hook=None, array_hook=None):
+    """Read a PHP-serialized object hierarchy from a string.  Characters in the
+    string past the object's representation are ignored.  On Python 3 the
+    string must be a bytestring.
+    """
+    return load(BytesIO(data), charset, errors, decode_strings,
+                object_hook, array_hook)
+
+
+def dump(data, fp, charset='utf-8', errors=default_errors, object_hook=None):
+    """Write a PHP-serialized representation of obj to the open file object
+    `fp`.  Unicode strings are encoded to `charset` with the error handling
+    of `errors`.
+
+    `fp` must have a `write()` method that accepts a single string argument.
+    It can thus be a file object opened for writing, a `StringIO` object
+    (or a `BytesIO` object on Python 3), or any other custom object that meets
+    this interface.
+
+    The `object_hook` is called for each unknown object and has to either
+    raise an exception if it's unable to convert the object or return a
+    value that is serializable (such as a `phpobject`).
+    """
+    fp.write(dumps(data, charset, errors, object_hook))
+
+
+def dict_to_list(d):
+    """Converts an ordered dict into a list."""
+    # make sure it's a dict, that way dict_to_list can be used as an
+    # array_hook.
+    d = dict(d)
+    try:
+        return [d[x] for x in xrange(len(d))]
+    except KeyError:
+        raise ValueError('dict is not a sequence')
+
+
+def dict_to_tuple(d):
+    """Converts an ordered dict into a tuple."""
+    return tuple(dict_to_list(d))
+
+
+serialize = dumps
+unserialize = loads