Miscellaneous Software Design Issues
We're currently evaluating / using neo4j.org as a graph database back end to store the "social" data, e.g., edges between participants (social network), comments & thumbs-ups they leave for each other, and what activities they have performed. There should be a clear distinction in the responsibilities of the graph database vs. the responsibilities of our Django RDBMS.
The graph database is used to calculate similarity between individuals based on the activities they perform (time, location, and frequency) and their communication patterns. Initial algorithm: Generate a series of vectors for each participant along one dimension (e.g., comment count on each activity or thumbs-up counts on each activity or direct activity performed counts). *Note* - we need to decide whether we ask the rdbms or neo4j for the comment counts. Apply cosine similarity function on the vectors between participants.
The main benefit of the graph database vs RDBMS is when we want to ask a query that results in a long traversal (e.g., we ask one query that gives us an intermediate result that we then need to use to ask another query that we then need to use to ask another query, etc).
Hi, Person XYZZY:
This is a summary of your group's activity in the Lighter Footprints Challenge.
- You turned off the water while brushing your teeth
- Participant X performed Don't Flush Your Toilet 3 times
- Participant Y ate a local lunch
- Participant A, B, Z carpooled
- Participant D commented on your post, saying "You are the weakest link!"
- Participant E liked your post