Commits

Mario Blažević committed 0ebc933

Removed the leftover unused files.

Comments (0)

Files changed (3)

Data/Attoparsec/Char8.hs

--- |
--- Module      :  Data.Attoparsec.Char8
--- Copyright   :  Bryan O'Sullivan 2007-2011
--- License     :  BSD3
---
--- Maintainer  :  bos@serpentine.com
--- Stability   :  experimental
--- Portability :  unknown
---
--- Simple, efficient, character-oriented combinator parsing for
--- 'ByteString' strings, loosely based on the Parsec library.
-
-module Data.Attoparsec.Char8
-    (
-      module Data.Attoparsec.ByteString.Char8
-    ) where
-
-import Data.Attoparsec.ByteString.Char8

doc/attoparsec-rewired-2.md

-In my
-[first of this pair of articles](http://www.serpentine.com/blog/2010/03/03/whats-in-a-parsing-library-1/),
-I laid out some of the qualities I've been looking for in a parsing
-library.
-
-Before I dive back into detail, I want to show off some numbers.  The
-new Attoparsec code is _fast_.
-
-![Performance](http://chart.apis.google.com/chart?cht=bvs&chs=340x200&chd=t:260,0,0,0,0|0,471,0,0,0|0,0,17037,0,0|0,0,0,23753,0|0,0,0,0,36753&chds=0,40000&chco=4D89F9|894DF9|F94D89|4DB989|1969FD&chdl=260+ms:+http_parser|471+ms:+Attoparsec|17037+ms:+Parsec+3+CPS|23753+ms:+Lazy+Parsec+3+CPS|36753+ms:+Parsec+3&chxt=y&chxl=0:||10|20|30|40&chtt=Time+to+parse+45,668|HTTP+GET+requests)
-
-What did I benchmark?  I captured some real HTTP GET requests from a
-live public web server, averaging 431 bytes per request.  I chucked
-them into a file, and measured the time needed to parse the entire
-contents of the file with the following libraries:
-
-* Ryan Dahl's [http-parser](http://github.com/ry/http-parser) library,
-  which is 1,672 lines of hand-rolled C craziness.  Its heritage seems
-  to be closely based on the Ragel-generated parser used by Mongrel.
-  This library is a fair approximation to about as fast as you can
-  get, since it's been tuned for just one purpose.  I wrote a small,
-  but reasonably realistic,
-  [driver program](http://bitbucket.org/bos/attoparsec/src/tip/examples/rfc2616.c)
-  to wire it up to file-based data, adding another 210 lines of code.
-  
-* An Attoparsec-based
-  [HTTP request parser](http://bitbucket.org/bos/attoparsec/src/tip/examples/RFC2616.hs),
-  54 lines long, with about 30 lines of
-  [driver program](http://bitbucket.org/bos/attoparsec/src/tip/examples/TestRFC2616.hs).
-  (Attoparsec itself is about 900 lines of code.)
-  
-* Several
-  [Parsec-3-based parsers](http://bitbucket.org/bos/attoparsec/src/tip/examples/Parsec_RFC2616.hs),
-  which are almost identical in length to the Attoparsec-based
-  version.
-  
-The Parsec 3 parsers come in three varieties:
-
-* The fastest uses a patch that Antoine Latter wrote to switch Parsec
-  3's internal machinery over to using continuation passing style
-  (CPS).  This parser uses `ByteString` for input, and reads the
-  entire 18.8MB file in one chunk.
-  
-* Next is the same parser, using lazy `ByteString` I/O to read the
-  file in 64KB chunks.  This costs about 50% in performance, but is
-  almost mandatory to maintain a sensible footprint on large inputs.
-  
-* In last place is the official version of Parsec 3, reading the input
-  in one chunk.  (Reading lazily still costs an additional 50%, but I
-  didn't want to further clutter the chart with more big numbers.)
-
-What's interesting to me is that the tiny Attoparsec-based parser,
-which is more or less a transliteration of the relevant parts of
-[RFC 2616](http://www.w3.org/Protocols/rfc2616/rfc2616.html), is so
-fast.
-
-I went back and remeasured performance of the Attoparsec and C parsers
-on a larger data set (295,568 URLs), and got these numbers:
-
-* Attoparsec: 2.889 seconds, or 102,308 requests/second
-
-* C: 1.614 seconds, or 183,128 requests/second
-
-That clocks the Attoparsec-based parser at about 56% the speed of the
-C parser.  Not bad, given that it's about 3.2% the number of lines of
-code!
-
-Of course there are tradeoffs involved here.
-
-* Parsec 3 emits much more friendly error messages, and can handle
-  many different input types.  Attoparsec, being aimed at
-  plumbing-oriented network protocols, considers friendly error
-  messages to not be worth the effort, and is specialised to the
-  arrays of bytes you get straight off the network.
-  
-* Parsec 3 requires all of its input to be available when the parser
-  is run (either in one large chunk or via lazy I/O).  If Attoparsec
-  has insufficient data to return a complete result, it hands back a
-  continuation that you provide extra data to.  This eliminates the
-  need for lazy I/O and any additional buffering, and makes for a
-  beautiful, pure API that doesn't care what its input source is.
-
-The memory footprint of the Attoparsec-based parser is small: it will
-run in 568KB of heap on my 64-bit laptop.  The smallest heap size that
-the Parsec 3 parser can survive in isn't all that much larger: with
-lazily read input, it will run in a 750KB heap.
-
-Overall, this is yet another instance where a little careful attention
-to performance yields very exciting results.  Personally, I'd be quite
-happy to trade a 97% reduction in code size for such a small
-performance hit, especially given the clarity, ease of use, and
-flexibility of the resulting code.  (The `http_parser` API is frankly
-not so much fun to use, even though I completely understand the
-motivation behind it.)

doc/whats-in-a-parser-1.md

-My goal in working on the
-[new GHC I/O manager](http://www.serpentine.com/blog/2010/01/22/new-ghc-io-manager-first-benchmark-numbers/)
-has been to get the Haskell network stack into a state where it could
-be used to attack high-performance and scalable networking problems,
-domains in which it has historically been weak.
-
-While it's encouraging to have an excellent networking stack (Johan
-and I now have this thoroughly in hand), the next thing I'd look for
-is libraries to help build networked applications.  One of the
-fundamental things that such apps need to do well is parse data, be it
-received from the network or read from files.
-
-The Haskell parsing library of first resort has for years been
-[Parsec](http://www.haskell.org/haskellwiki/Parsec).  While other
-capable libraries exist
-(e.g. [polyparse](http://hackage.haskell.org/package/polyparse) and
-[uu-parsinglib](http://hackage.haskell.org/package/uu-parsinglib)),
-they don't appear to see much use.
-
-As appealing as Parsec's API is, it has a few problems:
-
-* Parsec 2 is slow, and it has high memory overhead, due to its use of
-  Haskell's `String` type for tokens.  Parsec 3 can use the more
-  efficient `ByteString` type (which is in any case much more
-  appropriate for networked applications that deal in octets), but it
-  achieves this flexibility at the cost of being even slower than
-  Parsec 2.
-
-* Parsec's API demands that all of a parser's input be available at
-  once.  People usually work around this by feeding a Parsec parser
-  with lazily read data, but lazy I/O is at odds with my goal of
-  writing solid networked code.
-
-What properties should a parsing library for networked applications
-ideally possess?  There are a few obvious desiderata that have been
-well known for years. For example, it's important to have an appealing
-API and programming model.  Parsec squarely fits this desire.
-  
-Performance is also a big consideration.  Ideally, a parsing library
-would be fast enough that you wouldn't feel any real need for either
-of the following:
-
-* A few weeks to write an insane hand-bummed parser.
-  
-* Mechanical parser generators or lexers
-  (e.g. [happy](http://www.haskell.org/happy/) or
-  [alex](http://www.haskell.org/alex/)).
-  
-There are some additional important constraints on a realistic
-library: it must fit well into a highly concurrent networked world
-full of unreliable, hostile and incompetent clients.
-
-* High concurrency levels demand a low per-connection memory
-  footprint.
-
-* The need to cope with poorly behaved clients requires that
-  applications must be able to throttle connections that are too busy,
-  or kill connections that are too slow or attempting to consume too
-  many server resources.  A good parsing library will not get in the
-  way of these needs.
-
-A few years ago, I made a few half-hearted attempts to write a
-specialised version of Parsec, which I eventually named
-[Attoparsec](http://hackage.haskell.org/package/attoparsec).
-
-I began with a stripped-down Parsec that was specialised to accept
-`ByteString` input.  I then extended the API to allow a parser to
-consume small chunks of input at a time.
-
-Because I wasn't using Attoparsec "in anger" at the time, I made sure
-that my library worked (more or less), but I was not measuring its
-performance.
-
-In late January of this year, I began to think about using Attoparsec
-as the parser for a simple HTTP server that I could use to benchmark
-our new GHC I/O manager code.  Clearly, I'd want the parser to perform
-well, or it would distort my numbers rather badly.
-
-By coincidence, [John MacFarlane](http://johnmacfarlane.net/) emailed
-me around the same time, with disturbing findings: he'd tried
-Attoparsec, and found its performance to be *terrible*!  In fact, it
-was 4 to 20 times _slower_ than plain Parsec with his experimental
-parser and test data.  Clearly, I had some hard work to look forward
-to.
-
-Happily, that work is now almost complete, and I am pleased with the
-results.  In the next post, I'll have some details of what this all
-entails.