What is this?

Juriscraper is a scraper library that is used to scrape the American court system. It is currently able to scrape all major appellate Federal courts, and state courts are planned soon.

Juriscraper is part of a two-part system. The second part is the 'caller', which should be developed by the system using Juriscraper. The caller is responsible for calling a scraper, downloading and saving its results. A reference implementation of the caller has been developed and is in use at The code for that caller can be found here.

Some of the design goals for this project are:

  • extensibility to support video, oral arguments, etc.
  • extensibility to support geographies (US, Cuba, Mexico, California)
  • Mime type identification through magic numbers
  • Generalized architecture with no code repetition
  • Xpath-based scraping powered by lxml's html parser
  • return all meta data available on court websites (caller can pick what it needs)
  • no need for a database
  • clear log levels (DEBUG, INFO, WARN, CRITICAL)
  • friendly to court websites

Installation & dependencies

# install the dependencies
sudo pip install chardet==1.0.1
sudo pip install requests==0.10.2
sudo mkdir /var/log/juriscraper/

# install the code
sudo mkdir /usr/local/juriscraper
cd /usr/local/juriscraper
hg clone .

# add Juriscraper to your python path (in Ubuntu/Debian)
sudo ln -s /usr/local/juriscraper /usr/lib/python2.7/dist-packages/juriscraper


The scrapers is written in Python, and can can scrape a court as follows:

from juriscraper.opinions.united_states.federal import ca1

# Create a site object 
site = ca1.Site()

# Populate it with data

# Print out the object
print str(site)

It's also possible to iterate over all courts in a Python package, even if they're not known before starting the scraper. For example:

court_id = 'juriscraper.opinions.united_states.federal'
scrapers = __import__(court_id,
for scraper in scrapers:
    mod = __import__('%s.%s' % (court_id, scraper),
    site = mod.Site()

This can be useful if you wish to create a command line scraper that iterates over all courts of a certain jurisdiction that is provided by a script or a user.

Development of a to_xml() or to_json() method has not yet been completed, as all callers have thus far been able to work directly with the Python objects.

Version History

0.1 - Supports all common Federal Appeals courts

0.2 - Support for all possible Federal District courts and small Federal Appeals courts
0.3 - Support for all state appeals courts

- add oral arguments
- add video
- add other countries


Juriscraper is licensed under the permissive BSD license.