1. petermr
  2. patentAnalysis

Overview

Patents appear weekly with an index file of the sort EPO-yyyy-mm-dd.xml

(a) these are downloaded by uk.ac.cam.ch.wwmm.Crawler.EpoCrawler 
creating a log.txt file with contents:
loaded EPO-2009-04-22.xml
attempting to download 183 patents
EP�2049490, A1, skipped - unwanted format (PCT)
EP�2049476, A1, skipped - unwanted format (PCT)
EP�2050749, A1, downloaded
EP�2050450, A1, downloaded
...
The downloads are ZIP files


To run the system:
 .. either compile and run PatentProcessor or download the jar
 
 args:
   -p parsePatent.xml -d <directory with files>
   
All jobs should produce a weekTotal.html under the week

P.