1. Miki Tebeka
  2. pycon2013


Miki Tebeka  committed 0c17268


  • Participants
  • Parent commits b83db8a
  • Branches default

Comments (0)

Files changed (5)

File 33-27.txt

  • Ignore whitespace
Empty file added.

File big-data-algorithms.txt

View file
  • Ignore whitespace
+- skip lists, hyperloglog counting, bloom filters and countmin
+- "Large is hard, infinite is much easier."
+- Often there's a lot of data that doesn't matter to the computation
+    - Example with mean by removing lower 8 bits
+- Skip List
+- A good hash function is essentially like a good random number generator
+- hyperloglog - find longest run of 0 in hash of objects. About how many
+  distinct objects you saw
+    - How many flip coins - longest running head
+- Bloom filter
+    - Only false positive
+    - He uses in gnome sequencing to reduce time
+- ipython blocks

File func-py.txt

View file
  • Ignore whitespace
+-   def __init__(self):
+        self.attr = self._make_attr()  # _make_attr is "pure"
+- Freeze data to make it immutable, can thaw later
+- Example with _frozen True to False then __setitem__ will raise
+    - Good to find where legacy code mutates things
+- Coroutines push example with yield and send
+    - Harder to debug
+- Some problems are not fitted well to functional programming

File visualizing-github-1.txt

View file
  • Ignore whitespace
+* acquire parse filter mine | represent refine interact
+* Ben Fry book on information representations (wrote processing)
+* acquire usually the hardest
+* get meaningful subset of data (since were rate limited)
+* IPython and later tmux
+* ec2 + mongodb
+* celery + heruko

File visualizing-github-2.txt

View file
  • Ignore whitespace
+- We produce a *story* (about selection visualization)
+- Data as story telling is new (vs words/picture/music ...)
+- Context: Medium and Audience
+- The eye candy trap (beautiful noise is still just noise)
+- Mapping data onto meaningful visuals
+- D3
+- Animation can help explain overload of info on element
+- Can use text, not too much
+- Stored data on files and loaded with D3
+    - Browser caching works for you
+- JSON have type but bloated for data
+- Ended up using CSV with JSON schema