Airbot is a Python/Django app to that aims to gather the first air quality violation dataset. It consists of of a scraper, to get data from the Texas Commission on Environmental Quality (TCEQ) and a REST API.


Development of Airbot requires Python3 and virtualenv.

First setup your virtualenv - this allows us to install Airbot and it's requirements on a self-contained manner.

virtualenv --python=python3 ~/airbot

Next, activate your virtualenv.

source ~/airbot/bin/activate

Install the dependencies

pip install -r airbot/requirements.txt

First Run

There are two parts Airbot: the API server and Scraper.

python migrate
python createsuperuser
python runserver

You should then by able to login by visiting

Scraping Data

With your virtualenv activated you can download today's reports by running the following django management command:

python scrape_data

This will download all data for all locations. If you'd like to download just a single location's readings, use the site_id argument. Site Ids can be found in

python scrape_data --site_id 48_355_0083

Scrape data can also take the following command line arguments to download historical data:

--year YEAR           The year to download
--month MONTH         The month to download
--day DAY             The day to download
--site_id SITE_ID     The id of the site where you want to download


Using the API via ajax requires a valid token for authentication. Or you can be logged in to the admin and explore it by visiting Under heavy development, so expect breakage.

Auth Token

Auth Token should be passed in as a header in the request as follows:

Authorization: Token my_auth_token_here


An endpoint that gives all locations and their site id tracked by Airbot.


An endpoint that lets you query individual readings


An endpoint that lets you see when a reading's value has been changed and what from / to


Averages the output of readings for a site and pollutant. Defaults to 24 hours.

Query params:

  • site_id - Limit results to a given site_id. Pass in multiple site_ids to get more than one site at a time i.e. 'site_id=48_201_0026,48_355_0041'
  • pollutant - Limit results to a given pollutant. Pass in multiple pollutant ids to get more than one i.e. pollutant=25,38 to get Benzene and Ethylbenzene results