Source

hs2010 / slides.md

Full commit

% Scalable I/O Event Handling for GHC % Bryan O'Sullivan, Serpentine % Johan Tibell, Google

Demands of the modern networked environment

Networked apps have to cope with a wide range of load

  • Fast, ephemeral clients (traditional HTTP)

  • Slow, long-lived peers (Comet, instant messaging, IMAP)

Surviving the onslaught

Use timeouts and rate limits to add robustness

  • Buggy clients

  • Flaky networks

  • DDoS attacks

Implications for code

  • We need to respond quickly to events, even with many idle connections

  • Cheap timeouts let us deal effectively with flaky/malicious clients

A little history

  • "C10K" dates back to 1999

  • People now deploy services capable of $10^5$ clients per server

  • Next horizon: $10^7$ concurrent connections per server

Programming models: events

  • I/O actions (e.g. sends, receives) return immediately

  • I/O events include "received a message", "finished a send"

  • On event: find, update, save state, maybe perform an action

  • Examples: Twisted (Python), Node.js (Javascript)

Event-based programming

  • Yay! Can be cheap to implement

  • Yay! May handle high concurrency

  • Boo! Explicit saving of state is a pain

  • Boo! Application code tends to sprawl out of control

  • Boo! Iffy multicore support

Programming models: threads

  • I/O blocks until complete

  • Concurrent I/O handled through threads

  • Much state is managed implicitly

  • Examples: NIO (Java), Eventlet (Python)

Threaded programming

  • Yay! implicit management of local state

  • Boo! A poor threading system limits concurrency

  • Boo! Managing shared state can be very hard

Modern Haskell

  • Lightweight, powerful, mature concurrency model (< 200 bytes per thread)

  • Blocking I/O, timeout management

  • Good management of shared state, e.g. STM

  • Large family of useful libraries (Hackage FTW)

  • Native code performance

So where is (was?) the catch?

  • GHC's old I/O manager had some scaling limits

  • Biggest limit: select system call limited us to ~1000 file descriptors

  • Open files/sockets managed via a list

  • Timeouts managed as a sorted list

Our work

  • We rewrote GHC's I/O manager

  • Better OS integration

  • More appropriate data structures

What does this mean for you?

  • We left public APIs untouched

  • You get better scalability for free!

  • Ships in GHC 7

(Oh, and a new event API for masochists)

Better OS integration

  • Use epoll and kqueue instead of select

  • Remove array size limit imposed by select

  • Drop cost from O($n^2$) to O($n$)

Connection management

  • Big-endian Patricia tree to manage file descriptors (Okasaki, Gill, Leijen)

  • Drop cost from O($n^2$) to O($n\ \log\ n$)

Timeout management

  • Use priority search queue to manage timeouts (Hinze)

  • Why a PSQ?

  • Priority: get/remove the next timeout

  • Search: remove arbitrary timeout

  • Drop cost from O($n^2$) to O($n\ \log\ n$)

Performance: event handling

  • Long poll benchmark: 64 active clients, $n$ idle

  • At low concurrency, GHC 7 almost identical to GHC 6

  • GHC 6: crash (select limit) at $n = 1,000$

  • GHC 7: slowdown starts around $n = 25,000$

  • GHC 7: my laptop can still keep up at $n = 300,000$

Performance: timeout handling

  • How many timeouts can we process in 10 seconds?

  • GHC 6: 15,000

  • GHC 7: 1.5 million

  • In 20 seconds?

  • GHC 6: 17,000

  • GHC 7: 2.7 million

Future work

  • Got a 10gig network? We need one to stress the code enough to tune it further!

  • Per-CPU event managers

  • May be room for constant-factor improvements in low-level OS code

  • Better performance analysis tools

  • Windows support (sorry, we're Unix-only right now)

Thanks!

  • Especially to Simon Marlow, for fixing, and finding workarounds for, some exciting RTS bugs

  • And to John MacFarlane, for his pandoc tool

I can't recommend pandoc enough:

cabal install -fhighlighting pandoc

I wrote these slides using Emacs, pandoc, and make:

slides.html: slides.md
    pandoc --offline -o $@ -s -t slidy $<