Source

text / Data / Text / Encoding.hs

Author Commit Message Labels Comments Date
Bryan O'Sullivan
Oops! Back out part of 59aad6977070 - it was wrong My assertion that it was safe to skip the "do I have 1 byte available?" check was incorrect.
Bryan O'Sullivan
Make encoding slightly faster. The improvement mainly comes from dropping a redundant check when decoding an ASCII byte.
Bryan O'Sullivan
Silence a compiler warning.
Bryan O'Sullivan
Mark the ASCII decoding functions as deprecated.
Bryan O'Sullivan
Portable native UTF-8 decoder gives 3.7x faster decoding This code is derived from Björn Höhrmann's UTF-8 decoder. Compared to the original Haskell decoder from cac7dbcbc392, it's between 2.17 and 3.68 times faster. It's even between 1.18 and 3.58 times faster than the improved Haskell decoder from 71ead801296a. The x86-specific decoding path gives a substantial win for entirely and partly ASCII text, e.g. HTML and XML, at the cost of being about 17%…
Bryan O'Sullivan
Speed up UTF-8 decoding by a little over 2x The previous code was more concise, but alas GHC boxed each Word8 it read from the ByteString, which resulted in poor performance. This mankier code adds (seemingly required) strictness annotations, along with a little bit of manual CSE. Timing of the DecodeUtf8/Strict benchmark went from 41.8ms to 19.6ms, a pleasing improvement.
Bryan O'Sullivan
Oh noes! I was miscalculating the initial buffer size! When performance testing encodeUtf8, I noticed that for some reason I was still seeing "ensure" show up in the profile, when I expected it shouldn't have been. Turns out I was using a "min" where I should have been using a "max", and thus allocating an initial bytestring that would almost always be too small, thus forcing reallocations and copying. Boo!
Bryan O'Sullivan
Eliminate unnecessary resizes from encodeUtf8. We had been performing a resize any time that (a) we had data to write and (b) we got to within 4 bytes of filling the target bytestring. This was safe, but suboptimal, as it meant that in the common case of encoding ASCII text, we would *always* perform a resize. Now, we check the exact number of bytes we need to fit, and resize only if they won't fit. This eliminates resizes for ASCII data, an…
Bryan O'Sullivan
Improve error message.
Bryan O'Sullivan
Add decodeUtf8'.
Bryan O'Sullivan
Many small documentation improvements.
Bryan O'Sullivan
Get rid of the old decode function
Bryan O'Sullivan
Add a rewrite rule for fusion
Bryan O'Sullivan
Write a faster UTF-8 decoder
Bryan O'Sullivan
Remove old UTF-8 encoding functions
Bryan O'Sullivan
Update copyright
Bryan O'Sullivan
Rewrite encodeUtf8 for speed This was inspired by a patch from Simon Meier, who wrote a direct implementation of encodeUtf8 using his 'blaze-builder' package. His code showed a very impressive speedup. My code is similar in both structure and performance, its chief difference being that it doesn't require 'blaze-builder'.
Bryan O'Sullivan
Change Tom's email address
Bryan O'Sullivan
Add controllable error handling and recovery code.
Bryan O'Sullivan
Update copyrights and maintainers.
Tags
0.2
Bryan O'Sullivan
Fix Haddocks
Bryan O'Sullivan
Move Utf* modules into Data.Text.Encoding
Bryan O'Sullivan
Test the remaining supported encodings
Bryan O'Sullivan
Split encoding support out into new modules