Overview

Patents appear weekly with an index file of the sort EPO-yyyy-mm-dd.xml

(a) these are downloaded by uk.ac.cam.ch.wwmm.Crawler.EpoCrawler 
creating a log.txt file with contents:
loaded EPO-2009-04-22.xml
attempting to download 183 patents
EP 2049490, A1, skipped - unwanted format (PCT)
EP 2049476, A1, skipped - unwanted format (PCT)
EP 2050749, A1, downloaded
EP 2050450, A1, downloaded
...
The downloads are ZIP files


To run the system:
 .. either compile and run PatentProcessor or download the jar
 
 args:
   -p parsePatent.xml -d <directory with files>
   
All jobs should produce a weekTotal.html under the week

P.