SnapCity /

Filename Size Date modified Message
crawl_scripts
presence_matrix
taxi-data
1.1 KB

README

What is this repository for?

This folder contains scripts and data for the Snapcity project.

Google API specifics

  • The crawler scripts are written in python
  • Google places and distance matrix APIs are called in the scripts
  • Google API keys need to be generated using the Google web service and put in the scripts
  • Google rate limits these APIs - the places API is rate limited per key and the distance matrix API is rate limited per requesting IP
  • Registering at Google web service with a credit card, increases the API limit
  • For faster crawling, we used 6 credit card based API keys for the places API and 16 different IPs for the distance matrix API
  • The script has sctrict checks for over-query limits, so the credit cards were never charged
  • Using credit cards or multiple IPs are optimizations for faster crawling. With a single non-credit card based API key and a single IP, the scripts would work fine but will take longer to complete the process
  • Based on number of API keys and/or IP addresses, the over-query limits in the scripts have to be set by looking at Google's current policies