Overview

What is this?

Juriscraper is a scraper library that is used to scrape the American court system. It is currently able to scrape all major appellate Federal courts, and state courts are planned soon.

Juriscraper is part of a two-part system. The second part is the 'caller', which should be developed by the system using Juriscraper. The caller is responsible for calling a scraper, downloading and saving its results. A reference implementation of the caller has been developed and is in use at CourtListener.com. The code for that caller can be found here.

Some of the design goals for this project are:

  • extensibility to support video, oral arguments, etc.
  • extensibility to support geographies (US, Cuba, Mexico, California)
  • Mime type identification through magic numbers
  • Generalized architecture with no code repetition
  • Xpath-based scraping powered by lxml's html parser
  • return all meta data available on court websites (caller can pick what it needs)
  • no need for a database
  • clear log levels (DEBUG, INFO, WARN, CRITICAL)
  • friendly to court websites

Installation & dependencies

# install the dependencies
sudo pip install chardet==1.0.1
sudo pip install requests==0.10.2
sudo mkdir /var/log/juriscraper/

# install the code
sudo mkdir /usr/local/juriscraper
cd /usr/local/juriscraper
hg clone https://bitbucket.org/mlissner/juriscraper .

# add Juriscraper to your python path (in Ubuntu/Debian)
sudo ln -s /usr/local/juriscraper /usr/lib/python2.7/dist-packages/juriscraper

Usage

The scrapers is written in Python, and can can scrape a court as follows:

from juriscraper.opinions.united_states.federal import ca1

# Create a site object 
site = ca1.Site()

# Populate it with data
site.parse()

# Print out the object
print str(site)

It's also possible to iterate over all courts in a Python package, even if they're not known before starting the scraper. For example:

court_id = 'juriscraper.opinions.united_states.federal'
scrapers = __import__(court_id,
                      globals(),
                      locals(),
                      ['*']).__all__
for scraper in scrapers:
    mod = __import__('%s.%s' % (court_id, scraper),
                     globals(),
                     locals(),
                     [scraper])
    site = mod.Site()

This can be useful if you wish to create a command line scraper that iterates over all courts of a certain jurisdiction that is provided by a script or a user.

Development of a to_xml() or to_json() method has not yet been completed, as all callers have thus far been able to work directly with the Python objects.

Version History

Current
0.1 - Supports all common Federal Appeals courts

Roadmap
0.2 - Support for all possible Federal District courts and small Federal Appeals courts
0.3 - Support for all state appeals courts

Beyond
- add oral arguments
- add video
- add other countries

License

Juriscraper is licensed under the permissive BSD license.