Commits

Mike Ruckman  committed f8271e7

Added README.

  • Participants
  • Parent commits 56fcc51

Comments (0)

Files changed (1)

+=======
+scraper
+=======
+
+scraper is a wrapper for Mechanize's Browser class meant 
+specifically for scraping websites.
+
+Currently scraper only supports searching for class 
+attributes within valid html tags (via BeautifulSoup).
+
+------------
+Requirements
+------------
+You need to have three modules installed: argparse,
+mechanize and BeautifulSoup. You can install these via
+any normal means (easy_install, pip, etc.).
+
+.. _argparse: http://code.google.com/p/argparse/
+.. _mechanize: http://wwwsearch.sourceforge.net/mechanize/
+.. _BeautifulSoup: http://www.crummy.com/software/BeautifulSoup/
+
+---
+Use
+---
+
+scraper.py [-h] [-p page | -s search_term] [-r regex] [-o output]
+
+optional arguments:
+  -h, --help      show this help message and exit
+  -p page         the page you want to search.
+  -s search_term  Search term for Google News - put a "+" in for spaces.
+  -r regex        the regular expression you want to use to specify the class
+                  attribute to search for.
+  -o output       Specify the output file.
+
+--------
+Epilogue
+--------
+
+This is very much an alpha release. I only threw it up here 
+because I thought it might be useful to someone. I use it 
+to print a nice single page of stories related to things I 
+usually search for every morning.
+