Source

text / cbits / cbits.c

Author Commit Message Labels Comments Date
Ben Gamari
Add support for incremental decoding Decoding multi-byte encodings such as UTF-8 pose difficulty for streaming I/O as one must take care to carry the decoder state between incoming chunks. Here we introduce `decodeUtf8With'` which exposes an interface similar to that provided by cassava's `Data.Csv.Incremental`. To do this, we adapt the C UTF-8 decoder to expose its automaton state and codepoint accumulator.
Herbert Valerio Riedel
Optimize latin1-to-UTF16 C-implementation by using 32-bit loads
Herbert Valerio Riedel
Add new `Data.Text.Encoding.decodeLatin1` ISO-8859-1 decoding function This has about an order of magnitude lower runtime and/or call-overhead as compared to the more generic `text-icu` approach, e.g. according to criterion with GHC 7.4.1 on Linux/x86_64: * 12 times faster for empty input strings, * 6 times faster for 16-byte strings, and * 3 times faster for 1024-byte strings. `decodeLatin1` is also faster compared to using `decodeUtf8` for plain ASCII: * 2 t…
Bryan O'Sullivan
Drop trailing whitespace
Bryan O'Sullivan
Merge the performance- and correctness-affecting commits away
Bryan O'Sullivan
A valiant attempt at improving UTF-8 encoding performance. This didn't actually work - it slowed down aeson encoding by almost 2x!
Bryan O'Sullivan
Portable native UTF-8 decoder gives 3.7x faster decoding This code is derived from Björn Höhrmann's UTF-8 decoder. Compared to the original Haskell decoder from cac7dbcbc392, it's between 2.17 and 3.68 times faster. It's even between 1.18 and 3.58 times faster than the improved Haskell decoder from 71ead801296a. The x86-specific decoding path gives a substantial win for entirely and partly ASCII text, e.g. HTML and XML, at the cost of being about 17%…
Bryan O'Sullivan
Switch to native code for copying and comparison.
Tags
0.11.1.2