Commits

Bryan O'Sullivan committed cbbbc87

encodeUtf8_2: cap the number of wasted bytes at 2x

This has the odd side effect of improving tiny-string performance
from 20% slower then encodeUtf8_1 to about 5% faster. Never stop
being weird, GHC optimizer!

  • Participants
  • Parent commits 3e52938

Comments (0)

Files changed (1)

Data/Text/Encoding.hs

     with ptr $ \destPtr -> do
       c_encode_utf8 destPtr (A.aBA arr) (fromIntegral off) (fromIntegral len)
       newDest <- peek destPtr
-      return (PS fp 0 (newDest `minusPtr` ptr))
+      let utf8len = newDest `minusPtr` ptr
+      if utf8len >= len `shiftR` 1
+        then return (PS fp 0 utf8len)
+        else do
+          fp' <- mallocByteString utf8len
+          withForeignPtr fp' $ \ptr' -> do
+            memcpy ptr' ptr (fromIntegral utf8len)
+            return (PS fp' 0 utf8len)
 
 -- | Decode text from little endian UTF-16 encoding.
 decodeUtf16LEWith :: OnDecodeError -> ByteString -> Text