Source

pycon2013 / big-data-algorithms.txt

Full commit
- skip lists, hyperloglog counting, bloom filters and countmin
- "Large is hard, infinite is much easier."
- Often there's a lot of data that doesn't matter to the computation
    - Example with mean by removing lower 8 bits
- Skip List
- A good hash function is essentially like a good random number generator
- hyperloglog - find longest run of 0 in hash of objects. About how many
  distinct objects you saw
    - How many flip coins - longest running head
- Bloom filter
    - Only false positive
    - He uses in gnome sequencing to reduce time
- ipython blocks