1. Luke Francl
  2. word-unmunger


The Word Unmunger is a small Python program which removes much of the
HTML cruft produced by Microsoft Word, making them much easier to
hand-edit. It removes:

* XML namespace declarations
* Smart tags
* Meta tags
* HTML comments
* Style sheets
* DIVs
* The file list
* CSS classes
* Office grammar and spelling error markers
* Word X <![...]> markers (Thanks to Stephanie Smith for reporting this)

To use:

word-unmunger.py filename.htm [output-filename.html]

The Word Unmunger also has a batch mode. You can process several files
at once and have them dropped into an output directory with their
original filename. It works like this:

word-unmunger.py --output-dir=myDirectory file1.htm file2.htm file3.htm

It is not recommended that you run this software on HTML files you've
created yourself because it will probably remove a great deal of
formating from the files. But it's just the ticket when you want to
clean up Word's output for hand-editing.