Commits

Show all
Author Commit Message Labels Comments Date
Timothy Lu Hu Ball
updated README.txt to reflect virtual env
Timothy Lu Hu Ball
beautifulsoup should be on 3.0.x series not 3.1
Timothy Lu Hu Ball
added a requirements.txt for a virtual enviroment
or...@proton.perilouscodpiece.org
Addded rudimentary README to cover deps and some info about platform compatibility/testing/dev and so forth
or...@proton.perilouscodpiece.org
First (probably stupid and/or broken) cut at scraper activity logging and checking using mysql
or...@proton.perilouscodpiece.org
Looks like by design we only actually need to have an index on url
or...@proton.perilouscodpiece.org
Adding rudimentary mysql schema for the scrapers to log their work into.
or...@proton.perilouscodpiece.org
whitespace cleanups in the stub methods
or...@proton.perilouscodpiece.org
A rudimentary nod to error handling around the urlopen call
or...@proton.perilouscodpiece.org
Adding some initial work to url_get and so forth
or...@proton.perilouscodpiece.org
Primitive implementation of the url validity checking method
or...@proton.perilouscodpiece.org
Fleshed out primitive implementation of ScraperError (url and msg)
or...@proton.perilouscodpiece.org
Fixing another apparent typo (capitalization of base Exception class)
or...@proton.perilouscodpiece.org
Fixing a few apparent typos (at least, my python2.5 complained about them)
or...@proton.perilouscodpiece.org
Adding a few more (potential, assuming we can count on things like antiword) mime types to parse URLs from
j00bar
Implemented dispatch_parser and handle_embedded_links
j00bar
Implemented text and html link reference parsers
j00bar
Skeleton of __call__ and added dispatch_parser
j00bar
Added doctest and scrape_is_stale method
j00bar
Skeleton design for scraper.