1. Mike Ruckman
  2. Scraper

Overview

=======
scraper
=======

scraper is a wrapper for Mechanize's Browser class meant 
specifically for scraping websites.

Currently scraper only supports searching for class 
attributes within valid html tags (via BeautifulSoup).

------------
Requirements
------------
You need to have three modules installed: argparse,
mechanize and BeautifulSoup. You can install these via
any normal means (easy_install, pip, etc.).

.. _argparse: http://code.google.com/p/argparse/
.. _mechanize: http://wwwsearch.sourceforge.net/mechanize/
.. _BeautifulSoup: http://www.crummy.com/software/BeautifulSoup/

---
Use
---

scraper.py [-h] [-p page | -s search_term] [-r regex] [-o output]

optional arguments:
  -h, --help      show this help message and exit
  -p page         the page you want to search.
  -s search_term  Search term for Google News - put a "+" in for spaces.
  -r regex        the regular expression you want to use to specify the class
                  attribute to search for.
  -o output       Specify the output file.

--------
Epilogue
--------

This is very much an alpha release. I only threw it up here 
because I thought it might be useful to someone. I use it 
to print a nice single page of stories related to things I 
usually search for every morning.