This is a small collection of examples that run on Apache Spark.
git clone https://bitbucket.org/umarcts/spark-examples.git cd spark-examples ./gradlew build
Relevant JARs should end up in
Tests are included for all examples. Unit testing is an integral part of the programming process, and in the words of Kent Beck:
If a feature does not have a test, it does not exist
Google NGrams tracks the occurrences of various words and phrases throughout all the books in Google Books by year. The 1-Grams dataset tracks single words. The 1-grams data is tab-separated and its schema is:
- word - The word of interest
- year - The year of the data
- count - The number of times the word has appeared in books in this year
- volumes - The number of volumes the words has appeared in for this year
The output from
comma-separated and in the form:
- year - The year of interest
- length - The average length of all words in the year
This output can be easily plotted with your plotting tool of choice. I personally like R.
Moab is a scheduler from Adaptive Computing. It can record software license usage data in its logs. I'm not including any sort of instructions for using this, as it is extremely special-purpose.