This is the source code of a project I did for the TextMining course by Jan Scholtes at DKE of Maastricht University. It is about mining the GI-Files, a set of emails published on Wikileaks.
The subset of the data that was used can be downloaded here (and elsewhere).
This is a work in progress, without progress, as the course is over and I have not the time to continue to work on the project (also, you could work forever on that dataset and find more and more whos, whats, wheres and hows...)
If you are actually interested in using parts of the code and do your own experiments with the data, contact me and I may be able to give you some hints.