Easy to Use Twitter Crawler
This is an easy to use Twitter crawler.
Make a copy of the tokens.config.template file and call it tokens.config. Then enter each key in one line as the existing lines suggest.
Make a folder data/ parallel to bin/ .
mvn compile assembly:single
./bin/stream.sh streaming.txt data 10
- streaming.txt contains one word per line. Streaming is hard coded to respect english, german and turkish language in the moment.
- In the folder data, a file is generated to write the output to.
- 10 means the streaming runs for 10 minutes. I suggest to run the streamer for close to 23*60 (or similar) minutes with a cron job starting every day at a specific time.
Run crawler via REST api
Please check ./bin/crawl.sh as well. The easiest way to start is, however
./bin/crawl-by-offset.sh terms.txt -1 20
- terms.txt is the list of terms. Please check the format in the example in this repository. The second column are the actual search terms.
- -1 means the crawler checks for all Tweets posted yesterday. -2 would mean the day before yesterday and so on. You cannot go back more than 7 days, I think.
- 20 means, 20 pages of Tweets are crawled.