Commits

Show all
Author Commit Message Labels Comments Date
Default avatar Timothy Lu Hu Ball
updated README.txt to reflect virtual env
Default avatar Timothy Lu Hu Ball
beautifulsoup should be on 3.0.x series not 3.1
Default avatar Timothy Lu Hu Ball
added a requirements.txt for a virtual enviroment
Default avatar or...@proton.perilouscodpiece.org
Addded rudimentary README to cover deps and some info about platform compatibility/testing/dev and so forth
Default avatar or...@proton.perilouscodpiece.org
First (probably stupid and/or broken) cut at scraper activity logging and checking using mysql
Default avatar or...@proton.perilouscodpiece.org
Looks like by design we only actually need to have an index on url
Default avatar or...@proton.perilouscodpiece.org
Adding rudimentary mysql schema for the scrapers to log their work into.
Default avatar or...@proton.perilouscodpiece.org
whitespace cleanups in the stub methods
Default avatar or...@proton.perilouscodpiece.org
A rudimentary nod to error handling around the urlopen call
Default avatar or...@proton.perilouscodpiece.org
Adding some initial work to url_get and so forth
Default avatar or...@proton.perilouscodpiece.org
Primitive implementation of the url validity checking method
Default avatar or...@proton.perilouscodpiece.org
Fleshed out primitive implementation of ScraperError (url and msg)
Default avatar or...@proton.perilouscodpiece.org
Fixing another apparent typo (capitalization of base Exception class)
Default avatar or...@proton.perilouscodpiece.org
Fixing a few apparent typos (at least, my python2.5 complained about them)
Default avatar or...@proton.perilouscodpiece.org
Adding a few more (potential, assuming we can count on things like antiword) mime types to parse URLs from
Default avatar j00bar
Implemented dispatch_parser and handle_embedded_links
Default avatar j00bar
Implemented text and html link reference parsers
Default avatar j00bar
Skeleton of __call__ and added dispatch_parser
Default avatar j00bar
Added doctest and scrape_is_stale method
Default avatar j00bar
Skeleton design for scraper.
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.