PyQuery fails when trying to query a document with XML namespaces

Issue #17 wontfix
created an issue

lxml returns namespaces with objects when it comes across them in XML documents, but PyQuery fails when trying to use them, breaking on "{". I'm using lxml's etree in the mean time, but it would be nice to use PyQuery.

An example follows: {{{


In [2]: from pyquery import PyQuery as pq

In [3]: text = """ <?xml version="1.0" encoding="UTF-8"?> <OpenSearchDescription xmlns=""> <ShortName>Mozilla Add-ons</ShortName> </OpenSearchDescription>"""

In [4]: d = pq(text)

In [5]: d("ShortName") Out[5]: []

In [6]: d.children() Out[6]: [<{}ShortName>;]

In [7]: d("{}ShortName")

AssertionError Traceback (most recent call last) ... /site-packages/lxml/cssselect.pyc in tokenize_symbol(s, pos) 934 if match.start() == pos: 935 assert 0, ( --> 936 "Unexpected symbol: %r at %s" % (s[pos], pos)) 937 if not match: 938 result = s[start:]

AssertionError: Unexpected symbol: '{' at 0


Comments (4)

  1. Gael Pasgrimaud

    has you can see this problem came from lxml.

    you have 3 solutions:

    - submit the problem to the lxml team

    - monkey patch lxml.cssselect._illegal_symbol

    - apply xml.replace(' xmlns:', ' xmlnamespace:') to your xml before using pyquery so lxml will ignore namespaces (that's my solution)

  2. Anonymous

    lxml.etree.ElementTree.xpath function supports the karg namespaces so it is possible to run queries as


    here the problem is how to embed when pyquery calls

    results = [tag.xpath(xpath) for tag in elements]

    My idea is to use a dictionary of namespaces (module-wise or instance-wise) and use the syntax {ns}tag as css selector query. The selector_to_xpath function has to ignore the {xxx} part for its functional translation and add xxx:tag when reconstructing the xpath query string.

    This would be rather better than what i do now:

  3. Log in to comment