dukebody avatar dukebody committed 53a382e

Add option to the html parser to try to avoid empty nodes.
Hope it makes any difference. :)

Comments (0)

Files changed (1)


 except IndexError:
     raise IndexError("Usage: %s <path-to-html-file>" % __file__)
-tree = etree.parse(filename, html.HTMLParser())
+tree = etree.parse(filename, html.HTMLParser(remove_blank_text=True))
 root = tree.getroot()
 body(root)[0].set('id', doc_id)
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.