Source

text / cbits / cbits.c

Author Commit Message Labels Comments Date
Bryan O'Sullivan
Choose the no-INLINE code
Bryan O'Sullivan
Switch to a C lookup table for case mapping This should be considerably faster than the older Haskell-based code, and far less prone to inliner space explosions.
Jon Purdy
Remove redundant 'const' qualifiers.
Bryan O'Sullivan
encodeUtf8: squeeze out a little more fast path performance If we're about to drop from the fast to the slow path, try to cut our losses by pushing out a few more bytes before we give up.
Bryan O'Sullivan
encodeUtf8_2: drop parallel range checks Once I noticed that I'd screwed up the range checking and fixed it, it became slow enough to not be worth it. All test cases are about 10% faster with this extra complexity removed, with the exception of pure Russian, which is about 50% slower.
Bryan O'Sullivan
encodeUtf8_2: fix parallel range check This makes it rather expensive, alas.
Bryan O'Sullivan
encodeUtf8_2: add fast paths for x86_64 and i386 This helps performance a lot in most cases: up to 2x faster, in fact. The exception seems to be Japanese, which is slowed down by about 10%.
Bryan O'Sullivan
encodeUtf8_2: fix an off-by-one-bit error (!)
Bryan O'Sullivan
encodeUtf8_2: a C-based encoding function Not surprisingly, this is a lot faster than encodeUtf8_1 and the Builder-based rewrite under almost all circumstances. It's slower on tiny inputs (20%), but roughly twice as fast as encodeUtf8_1 on longer inputs.
Bryan O'Sullivan
Merge fix for gh-61 into 1.0 branch
Bryan O'Sullivan
Improve on previous fix This version tries to force the real decoding function to be inlined into each of its callers, which in turn each have different criteria for backing up a byte. This avoids an extra test at the end of strict decoding. While this seems to fix gh-61, I want to beef up the test suite so that it will correctly detect the bug.
Bryan O'Sullivan
A minimal fix for the regression introduced in the previous commit The refactoring in that commit was performed incorrectly, such that it would no longer detect as invalid an incomplete series of continuation bytes at the end of a string.
Bryan O'Sullivan
Present undecoded bytestring when streaming
Bryan O'Sullivan
Drop a magic number
Ben Gamari
cbits: Mark const pointers as such const needs to come after the * to mark pointer itself as constant.
Ben Gamari
Add support for incremental decoding Decoding multi-byte encodings such as UTF-8 pose difficulty for streaming I/O as one must take care to carry the decoder state between incoming chunks. Here we introduce `decodeUtf8With'` which exposes an interface similar to that provided by cassava's `Data.Csv.Incremental`. To do this, we adapt the C UTF-8 decoder to expose its automaton state and codepoint accumulator.
Herbert Valerio Riedel
Optimize latin1-to-UTF16 C-implementation by using 32-bit loads
Herbert Valerio Riedel
Add new `Data.Text.Encoding.decodeLatin1` ISO-8859-1 decoding function This has about an order of magnitude lower runtime and/or call-overhead as compared to the more generic `text-icu` approach, e.g. according to criterion with GHC 7.4.1 on Linux/x86_64: * 12 times faster for empty input strings, * 6 times faster for 16-byte strings, and * 3 times faster for 1024-byte strings. `decodeLatin1` is also faster compared to using `decodeUtf8` for plain ASCII: * 2 t…
Bryan O'Sullivan
Drop trailing whitespace
Bryan O'Sullivan
Merge the performance- and correctness-affecting commits away
Bryan O'Sullivan
A valiant attempt at improving UTF-8 encoding performance. This didn't actually work - it slowed down aeson encoding by almost 2x!
Bryan O'Sullivan
Portable native UTF-8 decoder gives 3.7x faster decoding This code is derived from Björn Höhrmann's UTF-8 decoder. Compared to the original Haskell decoder from cac7dbcbc392, it's between 2.17 and 3.68 times faster. It's even between 1.18 and 3.58 times faster than the improved Haskell decoder from 71ead801296a. The x86-specific decoding path gives a substantial win for entirely and partly ASCII text, e.g. HTML and XML, at the cost of being about 17%…
Bryan O'Sullivan
Switch to native code for copying and comparison.
Tags
0.11.1.2