Bryan O'Sullivan avatar Bryan O'Sullivan committed cbbbc87

encodeUtf8_2: cap the number of wasted bytes at 2x

This has the odd side effect of improving tiny-string performance
from 20% slower then encodeUtf8_1 to about 5% faster. Never stop
being weird, GHC optimizer!

Comments (0)

Files changed (1)

Data/Text/Encoding.hs

     with ptr $ \destPtr -> do
       c_encode_utf8 destPtr (A.aBA arr) (fromIntegral off) (fromIntegral len)
       newDest <- peek destPtr
-      return (PS fp 0 (newDest `minusPtr` ptr))
+      let utf8len = newDest `minusPtr` ptr
+      if utf8len >= len `shiftR` 1
+        then return (PS fp 0 utf8len)
+        else do
+          fp' <- mallocByteString utf8len
+          withForeignPtr fp' $ \ptr' -> do
+            memcpy ptr' ptr (fromIntegral utf8len)
+            return (PS fp' 0 utf8len)
 
 -- | Decode text from little endian UTF-16 encoding.
 decodeUtf16LEWith :: OnDecodeError -> ByteString -> Text
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.