Commits

Author Commit Message Labels Comments Date
Bryan O'Sullivan
Tidy up imports
Bryan O'Sullivan
Drop a redundant import
Bryan O'Sullivan
Drop the old pure-Haskell implementation of encodeUtf8
Bryan O'Sullivan
Drop the Builder-based encodeUtf8 implementation While it is very cool indeed, it is slower than the new C code under all circumstances, sometimes by a factor of two or more.
Bryan O'Sullivan
Let's see if this tweak helps the automated tests
Bryan O'Sullivan
encodeUtf8_2: drop parallel range checks Once I noticed that I'd screwed up the range checking and fixed it, it became slow enough to not be worth it. All test cases are about 10% faster with this extra complexity removed, with the exception of pure Russian, which is about 50% slower.
Bryan O'Sullivan
encodeUtf8_2: fix parallel range check This makes it rather expensive, alas.
Bryan O'Sullivan
Improve Arbitrary instances
Bryan O'Sullivan
I am enjoying these changelog edits
Bryan O'Sullivan
encodeUtf8_2: add fast paths for x86_64 and i386 This helps performance a lot in most cases: up to 2x faster, in fact. The exception seems to be Japanese, which is slowed down by about 10%.
Bryan O'Sullivan
Add a multibyte HTML document benchmark
Bryan O'Sullivan
Revise changelog perf note (yay!)
Bryan O'Sullivan
encodeUtf8_1: so long, it's been nice knowing you! Since encodeUtf8_2 wins under all circumstances, there's no reason to keep the intermediate version around.
Bryan O'Sullivan
encodeUtf8_2: fix an off-by-one-bit error (!)
Bryan O'Sullivan
encodeUtf8_2: cap the number of wasted bytes at 2x This has the odd side effect of improving tiny-string performance from 20% slower then encodeUtf8_1 to about 5% faster. Never stop being weird, GHC optimizer!
Bryan O'Sullivan
encodeUtf8_2: a C-based encoding function Not surprisingly, this is a lot faster than encodeUtf8_1 and the Builder-based rewrite under almost all circumstances. It's slower on tiny inputs (20%), but roughly twice as fast as encodeUtf8_1 on longer inputs.
Simon Meier
Improve small string performance for UTF-8 encoding to bytestrings On a 5 byte string the conversion of strict text to a strict bytestring is still a factor 2x slower than the custom 'encodeUtf8_1' routine. However, this is much better than the factor 4.5x that we started with. I attribute the slowdown to the more expensive startup cost for the bytestring-builder-based solution. Note that this startup cost is shared in case a small string is encoded as part of a…
Bryan O'Sullivan
Begin 1.1 release notes
Bryan O'Sullivan
encodeUtf8_1: get my arithmetic right :-(
Bryan O'Sullivan
Export both encodeUtf8 variants
Bryan O'Sullivan
Drop now-redundant imports
Bryan O'Sullivan
encodeUtf8_1: drop an unnecessary type signature The value that was having too general a type inferred is now a pointer, so inference doesn't accidentally overgeneralize.
Bryan O'Sullivan
encodeUtf8_1: drop a loop induction variable This helps performance quite a bit! Now encoding Japanese text is 2x faster than encodeUtf8, as opposed to 30% faster before. Not bad!
Bryan O'Sullivan
Drop unused import
Bryan O'Sullivan
encodeUtf8_1: make available with both bytestring versions
Bryan O'Sullivan
encodeUtf8_1: a little cosmetic work
Bryan O'Sullivan
encodeUtf8_1: refactor the last loop body This requires a bit more torturing to maintain performance. For some unknown reason, doing the same refactoring on go4 decreases performance on russian-small.txt by half!
Bryan O'Sullivan
encodeUtf8_1: refactor another loop body
Bryan O'Sullivan
encodeUtf8_1: refactor loop body
Bryan O'Sullivan
encodeUtf8_1: massively rework internals The goal here is to avoid a buffer size check on every iteration, instead only doing one the first time we encounter some input that's larger than the buffer we preallocated. This helps performance rather a lot: we don't regress on the smallest inputs, but we are up to 35% faster than the previous version of encodeUtf8 on larger inputs.
  1. Prev
  2. Next