PyQuery fails when trying to query a document with XML namespaces

Issue #17 wontfix
clouserw
created an issue

lxml returns namespaces with objects when it comes across them in XML documents, but PyQuery fails when trying to use them, breaking on "{". I'm using lxml's etree in the mean time, but it would be nice to use PyQuery.

An example follows: {{{

!python

In [2]: from pyquery import PyQuery as pq

In [3]: text = """ <?xml version="1.0" encoding="UTF-8"?> <OpenSearchDescription xmlns="http://a9.com/-/spec/opensearch/1.1/"> <ShortName>Mozilla Add-ons</ShortName> </OpenSearchDescription>"""

In [4]: d = pq(text)

In [5]: d("ShortName") Out[5]: []

In [6]: d.children() Out[6]: [<{http://a9.com/-/spec/opensearch/1.1/}ShortName>;]

In [7]: d("{http://a9.com/-/spec/opensearch/1.1/}ShortName")

AssertionError Traceback (most recent call last) ... /site-packages/lxml/cssselect.pyc in tokenize_symbol(s, pos) 934 if match.start() == pos: 935 assert 0, ( --> 936 "Unexpected symbol: %r at %s" % (s[pos], pos)) 937 if not match: 938 result = s[start:]

AssertionError: Unexpected symbol: '{' at 0

}}}

Comments (4)

  1. Gael Pasgrimaud

    has you can see this problem came from lxml.

    you have 3 solutions:

    - submit the problem to the lxml team

    - monkey patch lxml.cssselect._illegal_symbol

    - apply xml.replace(' xmlns:', ' xmlnamespace:') to your xml before using pyquery so lxml will ignore namespaces (that's my solution)

  2. Anonymous

    lxml.etree.ElementTree.xpath function supports the karg namespaces so it is possible to run queries as

    node.xpath("//osd:ShortName",namespaces=dict(osd="http://a9.com/-/spec/opensearch/1.1/")
    

    here the problem is how to embed when pyquery calls

    results = [tag.xpath(xpath) for tag in elements]
    

    My idea is to use a dictionary of namespaces (module-wise or instance-wise) and use the syntax {ns}tag as css selector query. The selector_to_xpath function has to ignore the {xxx} part for its functional translation and add xxx:tag when reconstructing the xpath query string.

    This would be rather better than what i do now:

    d(d.root.xpath(xquery,namespaces=ns))
    
  3. Log in to comment