Show all
Author Commit Message Labels Comments Date
Timothy Lu Hu Ball
updated README.txt to reflect virtual env
Timothy Lu Hu Ball
beautifulsoup should be on 3.0.x series not 3.1
Timothy Lu Hu Ball
added a requirements.txt for a virtual enviroment
Addded rudimentary README to cover deps and some info about platform compatibility/testing/dev and so forth
First (probably stupid and/or broken) cut at scraper activity logging and checking using mysql
Looks like by design we only actually need to have an index on url
Adding rudimentary mysql schema for the scrapers to log their work into.
whitespace cleanups in the stub methods
A rudimentary nod to error handling around the urlopen call
Adding some initial work to url_get and so forth
Primitive implementation of the url validity checking method
Fleshed out primitive implementation of ScraperError (url and msg)
Fixing another apparent typo (capitalization of base Exception class)
Fixing a few apparent typos (at least, my python2.5 complained about them)
Adding a few more (potential, assuming we can count on things like antiword) mime types to parse URLs from
Implemented dispatch_parser and handle_embedded_links
Implemented text and html link reference parsers
Skeleton of __call__ and added dispatch_parser
Added doctest and scrape_is_stale method
Skeleton design for scraper.