HTTPS SSH

Airbot

Airbot is a Python/Django app to that aims to gather the first air quality violation dataset. It consists of of a scraper, to get data from the Texas Commission on Environmental Quality (TCEQ) and a REST API.

Install

Development of Airbot requires Python3 and virtualenv.

First setup your virtualenv - this allows us to install Airbot and it's requirements on a self-contained manner.

virtualenv --python=python3 ~/airbot

Next, activate your virtualenv.

source ~/airbot/bin/activate

Install the dependencies

pip install -r airbot/requirements.txt

First Run

There are two parts Airbot: the API server and Scraper.

python manage.py migrate
python manage.py createsuperuser
python manage.py runserver

You should then by able to login by visiting http://127.0.0.1:8000/admin

Scraping Data

With your virtualenv activated you can download today's reports by running the following django management command:

python manage.py scrape_data

This will download all data for all locations. If you'd like to download just a single location's readings, use the site_id argument. Site Ids can be found in http://127.0.0.1:8000/admin/air/location

python manage.py scrape_data --site_id 48_355_0083

Scrape data can also take the following command line arguments to download historical data:

--year YEAR           The year to download
--month MONTH         The month to download
--day DAY             The day to download
--site_id SITE_ID     The id of the site where you want to download

API

Using the API via ajax requires a valid token for authentication. Or you can be logged in to the admin and explore it by visiting http://127.0.0.1:8000/. Under heavy development, so expect breakage.

Auth Token

Auth Token should be passed in as a header in the request as follows:

Authorization: Token my_auth_token_here

Locations

An endpoint that gives all locations and their site id tracked by Airbot.

Readings

An endpoint that lets you query individual readings

Changelogs

An endpoint that lets you see when a reading's value has been changed and what from / to

Thresholds

Averages the output of readings for a site and pollutant. Defaults to 24 hours.

Query params:

  • site_id - Limit results to a given site_id. Pass in multiple site_ids to get more than one site at a time i.e. 'site_id=48_201_0026,48_355_0041'
  • pollutant - Limit results to a given pollutant. Pass in multiple pollutant ids to get more than one i.e. pollutant=25,38 to get Benzene and Ethylbenzene results