Commits

Gregory Petukhov committed 2b55478

Improve clean_html method

  • Participants
  • Parent commits ba02fa7

Comments (0)

Files changed (1)

File grab/tools/lxml_tools.py

 
     # Keep only allowed attributes
     tree = parse_html(html)
-    for elem in tree.xpath('.//*'):
+    for elem in tree.xpath('./descendant-or-self::*'):
         for key in elem.attrib.keys():
             if key not in safe_attrs:
                 del elem.attrib[key]