Bryan O'Sullivan avatar Bryan O'Sullivan committed 52a35c6

Eliminate unnecessary resizes from encodeUtf8.

We had been performing a resize any time that (a) we had data to write
and (b) we got to within 4 bytes of filling the target bytestring.
This was safe, but suboptimal, as it meant that in the common case of
encoding ASCII text, we would *always* perform a resize.

Now, we check the exact number of bytes we need to fit, and resize
only if they won't fit. This eliminates resizes for ASCII data, and
makes them a little less likely for other data.

Comments (0)

Files changed (1)

Data/Text/Encoding.hs

     loop n1 m1 ptr = go n1 m1
      where
       go !n !m
-        | n-off == len = return $! PS fp 0 m
-        | size-m < 4 = {-# SCC "encodeUtf8/resize" #-} do
-            let newSize = size `shiftL` 1
-            fp' <- mallocByteString newSize
-            withForeignPtr fp' $ \ptr' -> memcpy ptr' ptr (fromIntegral m)
-            start newSize n m fp'
+        | n-off == len = return (PS fp 0 m)
         | otherwise = do
             let poke8 k v = poke (ptr `plusPtr` k) (fromIntegral v :: Word8)
-                w = A.unsafeIndex arr n
-            case undefined of
-             _| w <= 0x7F  -> do
+                ensure k act
+                  | size-m >= k = act
+                  | otherwise = {-# SCC "resizeUtf8/ensure" #-} do
+                      let newSize = size `shiftL` 1
+                      fp' <- mallocByteString newSize
+                      withForeignPtr fp' $ \ptr' ->
+                        memcpy ptr' ptr (fromIntegral m)
+                      start newSize n m fp'
+                {-# INLINE ensure #-}
+            case A.unsafeIndex arr n of
+             w| w <= 0x7F  -> ensure 1 $ do
                   poke8 m w
                   go (n+1) (m+1)
-              | w <= 0x7FF -> do
+              | w <= 0x7FF -> ensure 2 $ do
                   poke8 m     $ (w `shiftR` 6) + 0xC0
                   poke8 (m+1) $ (w .&. 0x3f) + 0x80
                   go (n+1) (m+2)
-              | 0xD800 <= w && w <= 0xDBFF -> do
+              | 0xD800 <= w && w <= 0xDBFF -> ensure 4 $ do
                   let c = ord $ U16.chr2 w (A.unsafeIndex arr (n+1))
                   poke8 m     $ (c `shiftR` 18) + 0xF0
                   poke8 (m+1) $ ((c `shiftR` 12) .&. 0x3F) + 0x80
                   poke8 (m+2) $ ((c `shiftR` 6) .&. 0x3F) + 0x80
                   poke8 (m+3) $ (c .&. 0x3F) + 0x80
                   go (n+2) (m+4)
-              | otherwise -> do
+              | otherwise -> ensure 3 $ do
                   poke8 m     $ (w `shiftR` 12) + 0xE0
                   poke8 (m+1) $ ((w `shiftR` 6) .&. 0x3F) + 0x80
                   poke8 (m+2) $ (w .&. 0x3F) + 0x80
                   go (n+1) (m+3)
-{- INLINE encodeUtf8 #-}
 
 -- | Decode text from little endian UTF-16 encoding.
 decodeUtf16LEWith :: OnDecodeError -> ByteString -> Text
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.