What is this repository for?
This folder contains scripts and data for the Snapcity project.
Google API specifics
- The crawler scripts are written in python
- Google places and distance matrix APIs are called in the scripts
- Google API keys need to be generated using the Google web service and put in the scripts
- Google rate limits these APIs - the places API is rate limited per key and the distance matrix API is rate limited per requesting IP
- Registering at Google web service with a credit card, increases the API limit
- For faster crawling, we used 6 credit card based API keys for the places API and 16 different IPs for the distance matrix API
- The script has sctrict checks for over-query limits, so the credit cards were never charged
- Using credit cards or multiple IPs are optimizations for faster crawling. With a single non-credit card based API key and a single IP, the scripts would work fine but will take longer to complete the process
- Based on number of API keys and/or IP addresses, the over-query limits in the scripts have to be set by looking at Google's current policies