Support for .make_links_absolute()

Issue #2 resolved
Anonymous created an issue

It would be great to have built-in support for calling .make_links_absolute() - something I find I need in order to make the most of pyquery for screen scraping.

Comments (9)

  1. Olivier Lauzanne repo owner

    Sounded like a good idea. So I did it. I is available on the trunk, it will be on the next release that I will make this week. I think there are still things that could be done for making it better for screen scrapping: - make it possible to use the BeautifulSoup parser (I think it's compatible with lxml so it wouldn't be a problem) - make it possible to use auth and headers

  2. Anonymous

    Found this when wondering why .make_links_absolute() doesn't make img src="" tags absolute. Please add absolute paths for img src="" as well!

    Thanks, Seth

  3. Anonymous

    Sorry I lost track of this issue. The problem with using the xml make_links_aboslute is that it will not work with the xml parser ... So I see no way of fixing it easily ...

  4. Anonymous

    I have included this in my code, when document is loaded

    from urllib import basejoin
          for attr in ( "src", "href" ):
              def abs_url(i,el):
                  if not re.match("^\s*https?://", S(el).attr[attr]):
                      S(el).attr[attr]= basejoin(self.url , S(el).attr[attr])
              S("[%s]" % attr).each(abs_url)
  5. Log in to comment