pyquery 1.2, queries are broken with xml.

René Dudfield
created an issue

In pyquery pre 1.2 the following would work.

This works with pyquery 1.1. {{{

from pyquery import PyQuery as pq d = pq("<X>1</X>", parser="xml") print d <X>1</X> d('X') [<X>] }}}

This fails with pyquery 1.2: {{{

from pyquery import PyQuery as pq d = pq("<X>1</X>", parser="xml") print d <X>1</X> d('X') [] }}}

It can not find the node X in the example above.

  1. Simon Sapin

    Hi, cssselect maintainer here.

    Short version: this particular problem should be fixed by setting translator.lower_case_element_names = False on the JQueryTranslator object in for XML documents.

    Longer version: pyquery should probably use GenericTranslator instead of HTMLTransator for non-HTML documents.

    Admittedly the documentation could be improved on this, but it is all explained in source comments:

    Elements names in selectors should be case-sensitive for XML but case-insensitive for HTML. To do that, cssselect.HTMLTranslator makes all elements names lower-case in selectors and expects the HTML parser to do the same in the document. lxml.html does. lxml.etree, however, parses XML and keeps the element name upper-case in the example, so the selector does not match. cssselect makes this assumption because there is no lower-case function in XPath 1.0.

    Compared to GenericTranslator, HTMLTransator makes element names and attributes lower-case, but also has an HTML-specific implementation of some pseudo-classes such as :link

  2. Gael Pasgrimaud

    Yep! Thanks for the help. Even if I've already figured out the problem ;)

    PyQuery now accept a custome css_translator and use JQueryTranslator(xhtml=True) for xml documents

    1.2.1 is available on pypi

  3. Simon Sapin

    Nice. I hadn’t thought of XHTML. To clarify, passing xhtml=True make HTMLTanslator behave like XML with respect to case-sensitivity, but still keeps HTML semantics. I’ll leave to you to decide if the later is what you want, even for stuff that might really not be (X)HTML.

