Patents appear weekly with an index file of the sort EPO-yyyy-mm-dd.xml (a) these are downloaded by uk.ac.cam.ch.wwmm.Crawler.EpoCrawler creating a log.txt file with contents: loaded EPO-2009-04-22.xml attempting to download 183 patents EP 2049490, A1, skipped - unwanted format (PCT) EP 2049476, A1, skipped - unwanted format (PCT) EP 2050749, A1, downloaded EP 2050450, A1, downloaded ... The downloads are ZIP files To run the system: .. either compile and run PatentProcessor or download the jar args: -p parsePatent.xml -d <directory with files> All jobs should produce a weekTotal.html under the week P.
fc703b1 - added parsepatent
914ffe7 - more visitors, and new pub-crawler version
2e439fa - added classifier
1b58715 - removed jninchi as clashed with osra
d62d118 - tidy and added unzip
a2507eb - fixed so tests work, some are ignored
65e649f - tidied and ignored soime tests for first upload