Commits

Gael Pasgrimaud  committed 91a4330 Merge

merge

  • Participants
  • Parent commits 58b15ba, 45ac7d9
  • Tags 0.3.1

Comments (0)

Files changed (6)

 87f002ce396754a04a55b4dc8494f38100957108 0.2
+9796ea9cb849ce66ca29b394b55a265fe2acb332 0.3

File pyquery/README.txt

 
 It can be used for many purposes, one idea that I might try in the future is to
 use it for templating with pure http templates that you modify using pyquery.
+I can also be used for web scrapping or for theming applications with
+`Deliverance`_.
+
+The `project`_ is being actively developped on a mercurial repository on
+Bitbucket. I have the policy of giving push access to anyone who wants it
+and then to review what he does. So if you want to contribute just email me.
+
+The Sphinx documentation is available on `pyquery.org`_.
+
+.. _deliverance: http://www.gawel.org/weblog/en/2008/12/skinning-with-pyquery-and-deliverance
+.. _project: http://www.bitbucket.org/olauzanne/pyquery/
+.. _pyquery.org: http://pyquery.org/
 
 .. contents::
 
     'you know Python rocks'
 
 You can use some of the pseudo classes that are available in jQuery but that
-are not standard in css such as :first :last :even :odd :eq :lt :gt::
+are not standard in css such as :first :last :even :odd :eq :lt :gt :checked
+:selected :file::
 
     >>> d('p:first')
     [<p#hello.hello>]
 Traversing
 ----------
 
-Some jQuery traversal methods are supported.  For instance, you can filter the selection list
-using a string selector::
+Some jQuery traversal methods are supported.  Here are a few examples.
+
+You can filter the selection list using a string selector::
 
     >>> d('p').filter('.hello')
     [<p#hello.hello>]
 
-Filtering can also be done using a function::
-
-    >>> d('p').filter(lambda i: i == 1)
-    [<p#test>]
-
-Filtering functions can refer to the current element as 'this', like in jQuery::
-
-    >>> d('p').filter(lambda i: pq(this).text() == 'you know Python rocks')
-    [<p#hello.hello>]
-
-The opposite of filter is `not_` - it returns the items that don't match the selector::
-
-    >>> d('p').not_('.hello')
-    [<p#test>]
-
-You can map a callable onto a PyQuery and get a mutated result. The result can
-contain any items, not just elements::
-
-    >>> d('p').map(lambda i, e: pq(e).text())
-    ['you know Python rocks', 'hello python !']
-
-Like the filter method, map callbacks can reference the current item as this::
-
-    >>> d('p').map(lambda i, e: len(pq(this).text()))
-    [21, 14]
-
-The map callback can also return a list, which will extend the resulting
-PyQuery::
-
-    >>> d('p').map(lambda i, e: pq(this).text().split())
-    ['you', 'know', 'Python', 'rocks', 'hello', 'python', '!']
-
 It is possible to select a single element with eq::
 
     >>> d('p').eq(0)
     [<p#hello.hello>]
 
-The `is_` method lets you query if any current elements match the selector::
-
-    >>> d('p').eq(0).is_('.hello')
-    True
-    >>> d('p').eq(1).is_('.hello')
-    False
-
-hasClass allows for checking for the presence of a class by name::
-
-    >>> d('p').eq(0).hasClass('hello')
-    True
-    >>> d('p').eq(1).hasClass('hello')
-    False
-
 You can find nested elements::
 
     >>> d('p').find('a')
 Making links absolute
 ---------------------
 
-You can make all links on a page absolute which can be usefull for screen
-scrapping::
+You can make links absolute which can be usefull for screen scrapping::
 
-    >>> d = pq(url='http://google.com')
-    >>> d('a:last').attr('href')
-    '/intl/fr/privacy.html'
+    >>> d = pq(url='http://www.w3.org/', parser='html')
+    >>> d('a[title="W3C Activities"]').attr('href')
+    '/Consortium/activities'
     >>> d.make_links_absolute()
     [<html>]
-    >>> d('a:last').attr('href')
-    'http://google.com/intl/fr/privacy.html'
+    >>> d('a[title="W3C Activities"]').attr('href')
+    'http://www.w3.org/Consortium/activities'
 
+Using different parsers
+-----------------------
+
+By default pyquery uses the lxml xml parser and then if it doesn't work goes on
+to try the html parser from lxml.html. The xml parser can sometimes be
+problematic when parsing xhtml pages because the parser will not raise an error
+but give an unusable tree (on w3c.org for example).
+
+You can also choose which parser to use explicitly::
+
+   >>> pq('<html><body><p>toto</p></body></html>', parser='xml')
+   [<html>]
+   >>> pq('<html><body><p>toto</p></body></html>', parser='html')
+   [<html>]
+   >>> pq('<html><body><p>toto</p></body></html>', parser='html_fragments')
+   [<p>]
+
+The html and html_fragments parser are the ones from lxml.html.
 
 Testing
 -------
 
     $ STATIC_DEPS=true bin/buildout
 
-Other documentations
---------------------
+More documentation
+------------------
 
-For more documentation about the API use the jquery website http://docs.jquery.com/
+First there is the Sphinx documentation `here`_.
+Then for more documentation about the API you can use the `jquery website`_.
+The reference I'm now using for the API is ... the `color cheat sheet`_.
+Then you can always look at the `code`_.
 
-The reference I'm now using for the API is ... the color cheat sheet
-http://colorcharge.com/wp-content/uploads/2007/12/jquery12_colorcharge.png
+.. _jquery website: http://docs.jquery.com/
+.. _code: http://www.bitbucket.org/olauzanne/pyquery/src/tip/pyquery/pyquery.py
+.. _here: http://pyquery.org
+.. _color cheat sheet: http://colorcharge.com/wp-content/uploads/2007/12/jquery12_colorcharge.png
 
 TODO
 ----
 
-- SELECTORS: it works fine but missing all the :xxx (:first, :last, ...) can be
-  done by patching lxml.cssselect
+- SELECTORS: still missing some jQuery pseudo classes (:radio, :password, ...)
 - ATTRIBUTES: done
 - CSS: done
 - HTML: done
-- MANIPULATING: did all but the "wrap" methods
-- TRAVERSING: did a few
+- MANIPULATING: missing the wrapAll and wrapInner methods
+- TRAVERSING: about half done
 - EVENTS: nothing to do with server side might be used later for automatic ajax
 - CORE UI EFFECTS: did hide and show the rest doesn't really makes sense on
   server side

File pyquery/cssselectpatch.py

         xpath.add_post_condition('position() mod 2 = 0')
         return xpath
 
+    def _xpath_checked(self, xpath):
+        """Matches odd elements, zero-indexed.
+        """
+        xpath.add_condition("@checked and name(.) = 'input'")
+        return xpath
+
+    def _xpath_selected(self, xpath):
+        """Matches all elements that are selected.
+        """
+        xpath.add_condition("@selected and name(.) = 'option'")
+        return xpath
+
+    def _xpath_disabled(self, xpath):
+        """Matches all elements that are disabled.
+        """
+        xpath.add_condition("@disabled")
+        return xpath
+
+    def _xpath_enabled(self, xpath):
+        """Matches all elements that are disabled.
+        """
+        xpath.add_condition("not(@disabled) and name(.) = 'input'")
+        return xpath
+
+    def _xpath_file(self, xpath):
+        """Matches all input elements of type file.
+        """
+        xpath.add_condition("@type = 'file' and name(.) = 'input'")
+        return xpath
+
 cssselect.Pseudo = JQueryPseudo
 
 class JQueryFunction(Function):

File pyquery/pyquery.py

 # Distributed under the BSD license, see LICENSE.txt
 from cssselectpatch import selector_to_xpath
 from lxml import etree
+import lxml.html
 from copy import deepcopy
 from urlparse import urljoin
 
-def fromstring(context):
+def fromstring(context, parser=None):
     """use html parser if we don't have clean xml
     """
-    try:
-        return etree.fromstring(context)
-    except etree.XMLSyntaxError:
-        return etree.fromstring(context, etree.HTMLParser())
+    if parser == None:
+        try:
+            return [etree.fromstring(context)]
+        except etree.XMLSyntaxError:
+            return [lxml.html.fromstring(context)]
+    elif parser == 'xml':
+        return [etree.fromstring(context)]
+    elif parser == 'html':
+        return [lxml.html.fromstring(context)]
+    elif parser == 'html_fragments':
+        return lxml.html.fragments_fromstring(context)
+    else:
+        ValueError('No such parser: "%s"' % parser)
 
 class NoDefault(object):
     def __repr__(self):
         html = None
         elements = []
         self._base_url = None
+        parser = kwargs.get('parser')
+        if 'parser' in kwargs:
+            del kwargs['parser']
+        if not kwargs and len(args) == 1 and isinstance(args[0], basestring) \
+           and args[0].startswith('http://'):
+            kwargs = {'url': args[0]}
+            args = []
 
         if 'parent' in kwargs:
             self._parent = kwargs.pop('parent')
                 self._base_url = url
             else:
                 raise ValueError('Invalid keyword arguments %s' % kwargs)
-            elements = [fromstring(html)]
+            elements = fromstring(html, parser)
         else:
             # get nodes
 
             # get context
             if isinstance(context, basestring):
                 try:
-                    elements = [fromstring(context)]
+                    elements = fromstring(context, parser)
                 except Exception, e:
                     raise ValueError('%r, %s' % (e, context))
             elif isinstance(context, self.__class__):
     ##############
 
     def filter(self, selector):
-        """Filter elements in self using selector (string or function)."""
+        """Filter elements in self using selector (string or function).
+
+            >>> d = PyQuery('<p class="hello">Hi</p><p>Bye</p>')
+            >>> d('p')
+            [<p.hello>, <p>]
+            >>> d('p').filter('.hello')
+            [<p.hello>]
+            >>> d('p').filter(lambda i: i == 1)
+            [<p>]
+            >>> d('p').filter(lambda i: PyQuery(this).text() == 'Hi')
+            [<p.hello>]
+        """
         if not callable(selector):
             return self.__class__(selector, self, **dict(parent=self))
         else:
             return self.__class__(elements, **dict(parent=self))
 
     def not_(self, selector):
-        """Return elements that don't match the given selector."""
+        """Return elements that don't match the given selector.
+
+            >>> d = PyQuery('<p class="hello">Hi</p><p>Bye</p><div></div>')
+            >>> d('p').not_('.hello')
+            [<p>]
+        """
         exclude = set(self.__class__(selector, self))
         return self.__class__([e for e in self if e not in exclude], **dict(parent=self))
 
     def is_(self, selector):
-        """Returns True if selector matches at least one current element, else False."""
+        """Returns True if selector matches at least one current element, else False.
+            >>> d = PyQuery('<p class="hello">Hi</p><p>Bye</p><div></div>')
+            >>> d('p').eq(0).is_('.hello')
+            True
+            >>> d('p').eq(1).is_('.hello')
+            False
+        """
         return bool(self.__class__(selector, self))
 
     def find(self, selector):
-        """Find elements using selector traversing down from self."""
+        """Find elements using selector traversing down from self.
+
+            >>> m = '<p><span><em>Whoah!</em></span></p><p><em> there</em></p>'
+            >>> d = PyQuery(m)
+            >>> d('p').find('em')
+            [<em>, <em>]
+            >>> d('p').eq(1).find('em')
+            [<em>]
+        """
         xpath = selector_to_xpath(selector)
         results = [child.xpath(xpath) for tag in self for child in tag.getchildren()]
         # Flatten the results
         return self.__class__(elements, **dict(parent=self))
 
     def eq(self, index):
-        """Return PyQuery of only the element with the provided index."""
+        """Return PyQuery of only the element with the provided index.
+
+            >>> d = PyQuery('<p class="hello">Hi</p><p>Bye</p><div></div>')
+            >>> d('p').eq(0)
+            [<p.hello>]
+            >>> d('p').eq(1)
+            [<p>]
+        """
         return self.__class__([self[index]], **dict(parent=self))
 
     def each(self, func):
 
         func should take two arguments - 'index' and 'element'.  Elements can
         also be referred to as 'this' inside of func.
+
+            >>> d = PyQuery('<p class="hello">Hi there</p><p>Bye</p><br />')
+            >>> d('p').map(lambda i, e: PyQuery(e).text())
+            ['Hi there', 'Bye']
+
+            >>> d('p').map(lambda i, e: len(PyQuery(this).text()))
+            [8, 3]
+
+            >>> d('p').map(lambda i, e: PyQuery(this).text().split())
+            ['Hi', 'there', 'Bye']
         """
         items = []
         try:
         return len(self)
 
     def end(self):
+        """Break out of a level of traversal and return to the parent level.
+
+            >>> m = '<p><span><em>Whoah!</em></span></p><p><em> there</em></p>'
+            >>> d = PyQuery(m)
+            >>> d('p').eq(1).find('em').end().end()
+            [<p>, <p>]
+        """
         return self._parent
 
     ##############
 
         """
         assert isinstance(value, basestring)
-        value = fromstring(value)
+        value = fromstring(value)[0]
         nodes = []
         for tag in self:
             wrapper = deepcopy(value)
             return self
 
         assert isinstance(value, basestring)
-        value = fromstring(value)
+        value = fromstring(value)[0]
         wrapper = deepcopy(value)
         if not wrapper.getchildren():
             child = wrapper

File pyquery/test.py

            </html>
            """
 
+    html4 = """
+           <html>
+            <body>
+              <form action="/">
+                <input name="enabled" type="text" value="test"/>
+                <input name="disabled" type="text" value="disabled" disabled="disabled"/>
+                <input name="file" type="file" />
+                <select name="select">
+                  <option value="">Choose something</option>
+                  <option value="one">One</option>
+                  <option value="two" selected="selected">Two</option>
+                  <option value="three">Three</option>
+                </select>
+                <input name="radio" type="radio" value="one"/>
+                <input name="radio" type="radio" value="two" checked="checked"/>
+                <input name="radio" type="radio" value="three"/>
+              </form>
+            </body>
+           </html>
+           """
+
     def test_selector_from_doc(self):
         doc = etree.fromstring(self.html)
         assert len(self.klass(doc)) == 1
         self.assertEqual(e('div:lt(1)').text(), 'node1')
         self.assertEqual(e('div:eq(2)').text(), 'node3')
 
+        #test on the form
+        e = self.klass(self.html4)
+        assert len(e(':disabled')) == 1
+        assert len(e('input:enabled')) == 5
+        assert len(e(':selected')) == 1
+        assert len(e(':checked')) == 1
+        assert len(e(':file')) == 1
+
 class TestTraversal(unittest.TestCase):
     klass = pq
     html = """
 
 long_description = open(os.path.join('pyquery', 'README.txt')).read()
 
-version = '0.2'
+version = '0.3'
 
 setup(name='pyquery',
       version=version,