Source

text / Data / Text / Encoding.hs

Author Commit Message Labels Comments Date
Bryan O'Sullivan
Add a Binary instance for strict Text Step one towards a fix for gh-115.
Tom Ellis
Don't leak space when passing the undecoded portion to the next iteration.
Bryan O'Sullivan
Merge
Bryan O'Sullivan
Reset the lazy decoder state consistently if an error occurs This fixes gh-87.
Bryan O'Sullivan
Merge pull request #71 from quchen/master Remove incorrect deprecation doc message
David Luposchainsky
Remove incorrect deprecation doc message The pragma-based deprecation of 'decodeASCII' correctly claims that 'decodeUtf8' should be used (matching the implementation), while the documentation string of the function itself recommended 'decodeLatin1' (which does not).
Bryan O'Sullivan
Rename textP to text Replace the old text smart constructor with the slightly smarter one we've had all along that ensures that it doesn't pin its array if it's empty.
lpsmith
Better support for GHC < 7.8
Bryan O'Sullivan
Refactor Read modules to share code
Bryan O'Sullivan
Fix build problems with GHC 7.8.1
Bryan O'Sullivan
Correct the documentation for streaming decoding
Bryan O'Sullivan
streamDecodeUtf8With: accumulate undecoded chunks correctly We had previously gotten the accounting and reporting wrong if an incomplete input was fed in over the course of several continuations, such that we'd report only the incomplete input seen by the most recent continuation. This fixes gh-70.
Bryan O'Sullivan
Tidy up imports
Bryan O'Sullivan
Drop a redundant import
Bryan O'Sullivan
Drop the old pure-Haskell implementation of encodeUtf8
Bryan O'Sullivan
Drop the Builder-based encodeUtf8 implementation While it is very cool indeed, it is slower than the new C code under all circumstances, sometimes by a factor of two or more.
Bryan O'Sullivan
encodeUtf8_1: so long, it's been nice knowing you! Since encodeUtf8_2 wins under all circumstances, there's no reason to keep the intermediate version around.
Bryan O'Sullivan
encodeUtf8_2: cap the number of wasted bytes at 2x This has the odd side effect of improving tiny-string performance from 20% slower then encodeUtf8_1 to about 5% faster. Never stop being weird, GHC optimizer!
Bryan O'Sullivan
encodeUtf8_2: a C-based encoding function Not surprisingly, this is a lot faster than encodeUtf8_1 and the Builder-based rewrite under almost all circumstances. It's slower on tiny inputs (20%), but roughly twice as fast as encodeUtf8_1 on longer inputs.
Simon Meier
Improve small string performance for UTF-8 encoding to bytestrings On a 5 byte string the conversion of strict text to a strict bytestring is still a factor 2x slower than the custom 'encodeUtf8_1' routine. However, this is much better than the factor 4.5x that we started with. I attribute the slowdown to the more expensive startup cost for the bytestring-builder-based solution. Note that this startup cost is shared in case a small string is encoded as part of a…
Bryan O'Sullivan
encodeUtf8_1: get my arithmetic right :-(
Bryan O'Sullivan
Export both encodeUtf8 variants
Bryan O'Sullivan
Drop now-redundant imports
Bryan O'Sullivan
encodeUtf8_1: drop an unnecessary type signature The value that was having too general a type inferred is now a pointer, so inference doesn't accidentally overgeneralize.
Bryan O'Sullivan
encodeUtf8_1: drop a loop induction variable This helps performance quite a bit! Now encoding Japanese text is 2x faster than encodeUtf8, as opposed to 30% faster before. Not bad!
Bryan O'Sullivan
Drop unused import
Bryan O'Sullivan
encodeUtf8_1: make available with both bytestring versions
Bryan O'Sullivan
encodeUtf8_1: a little cosmetic work
Bryan O'Sullivan
encodeUtf8_1: refactor the last loop body This requires a bit more torturing to maintain performance. For some unknown reason, doing the same refactoring on go4 decreases performance on russian-small.txt by half!
Bryan O'Sullivan
encodeUtf8_1: refactor another loop body
  1. Prev
  2. 1
  3. 2
  4. 3
  5. 4
  6. Next