Source

text / Data / Text / Encoding.hs

Author Commit Message Labels Comments Date
Bryan O'Sullivan
encodeUtf8_1: make available with both bytestring versions
Bryan O'Sullivan
encodeUtf8_1: a little cosmetic work
Bryan O'Sullivan
encodeUtf8_1: refactor the last loop body This requires a bit more torturing to maintain performance. For some unknown reason, doing the same refactoring on go4 decreases performance on russian-small.txt by half!
Bryan O'Sullivan
encodeUtf8_1: refactor another loop body
Bryan O'Sullivan
encodeUtf8_1: refactor loop body
Bryan O'Sullivan
encodeUtf8_1: massively rework internals The goal here is to avoid a buffer size check on every iteration, instead only doing one the first time we encounter some input that's larger than the buffer we preallocated. This helps performance rather a lot: we don't regress on the smallest inputs, but we are up to 35% faster than the previous version of encodeUtf8 on larger inputs.
Bryan O'Sullivan
encodeUtf8_1: hoist ensure up a level
Bryan O'Sullivan
encodeUtf8_1: refactor go to accept a pointer parameter
Bryan O'Sullivan
encodeUtf8_1: hoist poke8 up a level
Bryan O'Sullivan
Duplicate encodeUtf8 as encodeUtf8_1 temporarily
Bryan O'Sullivan
Merge pull request #63 from meiersi/polish-text-bytestring-builder-integration Polish UTF-8 bytestring builder support
Simon Meier
Add back 'ensure 1' to avoid overflowing an output buffer The counter-example for the existing code is a string of length '2*n' that starts with 'n' characters with codepoints in the range (0x7F, 0x7FF) and ends with 'n' ASCII characters. All 'n' ASCII characters will be written after the end of the output buffer.
Simon Meier
Polish UTF-8 bytestring builder support - adjust function names to 'encodeUtf8Builder' and 'encodeUtf8BuilderEscaped' - expose the same conversion to builders for both lazy and strict text - ensure 'Escaped' versions are inlined to allow specialization for specific escaping primitives - fix some Haddock references - add Haddock comment about bytestring >= 0.10.4.0 dependency - remove stream-to-builder encoding functions. There is no d…
Bryan O'Sullivan
Drop some special-casing for ASCII during UTF-8 encoding I somehow forgot that we allocate the initial ByteString to contain the same number of bytes as the Text contains code units. This means that we never need to ensure that the ByteString is big enough, nor (with this observation) does a special-cased ASCII-only loop help performance.
Bryan O'Sullivan
Merge the new bytestring builder code
Simon Meier
Merge branch 'master' into feature-new-bytestring-builder - newest benchmark results: 8.2 -> 7.2 ms for EncodeUtf8/Text benchamrk 18.2 -> 10.0 ms for EncodeUtf8/TextLazy benchmark ==> 13% and 81% speed improvement :-) Conflicts: Data/Text/Encoding.hs text.cabal
Simon Meier
implement 'encodeUtf8Builder' using 'encodeUtf8Escaped'
Simon Meier
implemented 'Text -> Builder' UTF-8 encoders It uses a coupled end-of-input-and-output boundary and exploits the UTF-16 representation of the 'Text' value. According to preliminary benchmarks, it is 25% faster than the existing 'encodeUtf8 :: Text -> ByteString' function. We also support an 'encodeUtf8AsciiEscaped' encoder that allows to special case encoding of ASCII characters. This is a very useful function for implementing escaping enco…
Simon Meier
implement strict Text to Builder encoder using BoundedEncodings A first test of the infrastructure can be found in my 'aeson' branch.
Bryan O'Sullivan
Clarify who the maintainer is
Bryan O'Sullivan
Rename the last set of internal modules
Bryan O'Sullivan
Rename strict fusion-related modules
Bryan O'Sullivan
Rename encoding-related modules, and make them semi-public
Bryan O'Sullivan
Present undecoded bytestring when streaming
Bryan O'Sullivan
Make older Num classes happy
Bryan O'Sullivan
Flesh out the stream decoding API a bit
Bryan O'Sullivan
Fix tiny typos
Bryan O'Sullivan
Make incremental decoding types clearer
Bryan O'Sullivan
Drop a magic number
Ben Gamari
Add support for incremental decoding Decoding multi-byte encodings such as UTF-8 pose difficulty for streaming I/O as one must take care to carry the decoder state between incoming chunks. Here we introduce `decodeUtf8With'` which exposes an interface similar to that provided by cassava's `Data.Csv.Incremental`. To do this, we adapt the C UTF-8 decoder to expose its automaton state and codepoint accumulator.
  1. Prev
  2. 1
  3. 2
  4. 3
  5. Next