Haskell test unfairly is not compiled with optimizations
at haskell/build.sh line 3 should be:
ghc -O2 Csv.hs
Comments (20)
-
repo owner -
reporter Kind of better. I will try to profile the code to find more places for improvement soon. Could you publish updated results for Haskell? Where I could download test .csv file?
-
repo owner Yes I already did that the other day. You should be able to follow along on wercker if you want to see the history of runs.
I'm happy for you to take a look at the Haskell code. It's the one that I struggled most to write and got @itkovian to help.
Link to wercker runs: https://app.wercker.com/ehiggs/csv-game/runs
NB: I also ran it locally using
cabal install -O2 --reinstall Cassava
but the difference was negligable. FWIW, docker image I made useslibghc-cassava-dev
(which I guess was last build here if you're interested to see what flags are used.) -
repo owner - marked as major
- marked as enhancement
-
repo owner I'll leave this with you @varosi. As it's currently a 'best effort' approach, if I don't hear back by, e.g. June, I'll close this. Does this work for you?
Thanks
-
reporter yes, alright! Thanks! I'm using Windows and there Docker is kind of more complicated. I don't have Docker setup under Windows. That's why I was asking for the file only.
-
reporter One thing you could try more is to change "1024" to "16768" at line 22 (haskell/Csv.hs) which will make chunks little bigger, so less time is spent on I/O operations. It gave some acceleration on my side.
-
repo owner pull request #14 merged to increase the buffer size.
-
reporter Another optimization which bring faster Haskell code is:
- Remove from haskell/Csv.hs line 6: import Data.Text
- Change from haskell/Csv.hs line 23 to: loop 0 (decode NoHeader :: Parser [()])
This brought few hundred miliseconds down.
-
repo owner pull request #15 puts in the suggested changes.
I removed the larger buffer size since it proved to be considerably worse on Linux (or, at least on the CI servers).
-
reporter If you have some time you could experiment with multiples by 1024, like 4096 or 8192 on Linux. I'm trying here on Windows 10 x64.
-
repo owner 4096 seems to work best on my local machine.
I feel that if you're going about your day and suddenly need to parse a csv file you shouldn't have to worry too much about buffer size. And if it gives the wild variance as I see in ghc's runtime when I change the buffer size, it just tells me exactly the kind of information I wanted to see from this game. ;)
-
repo owner Under analyse results, haskell has shaved 300ms but hasn't moved up the ranks: https://app.wercker.com/ehiggs/csv-game/runs/build/5914c15c2add120001590f43?step=5914c1dcb9c6890001fa20f2
-
reporter Could you update new Haskell results on the site? I'll try to see what else could be done using Cassava library.
-
reporter Here is a simpler suggestions (sorry that I have overloaded you with comments!):
{-# LANGUAGE BangPatterns, ScopedTypeVariables, NoMonomorphismRestriction #-} -- Using Cassava from: https://hackage.haskell.org/package/cassava import Control.Monad import qualified Data.ByteString as B import Data.Csv.Incremental import System.Exit import System.IO import System.Environment (getArgs) import qualified Data.List as DL main :: IO () main = do [input] <- getArgs withFile input ReadMode $ \ csvFile -> do let loop !acc (Many rs k) = loop (acc + countFields rs) =<< feed k loop !acc (Done rs) = print (countFields rs + acc) loop !_ (Fail _ errMsg) = putStrLn errMsg >> exitFailure feed k = k <$> B.hGetSome csvFile 4096 loop 0 (decode NoHeader :: Parser [()]) where countFields = sum . map length . \x -> [a | Right a <- x]
You could change to 1024 buffer if you like. And it expects first parameter of the program to be file path as other tests. That way is more multi-platform - it runs on my Windows, too.
-
reporter Another question is - do you have LLVM on your testing machine?
I think that results could be different if you pass to GHC this option: -fllvm Which uses LLVM as a compiler backend which should speed up Haskell even more. But the GHC depending on its version uses different LLVM. For example GHC 8.0.2 use LLVM 3.7.
-
repo owner I tried
-fllvm
but could not get this to work in my timebox. If you can fix up https://bitbucket.org/ewanhiggs/csv-game-docker to support this, we can try it. -
repo owner @varosi, any progress on this or can I close this?
-
repo owner June has passed and there have been no updates.
-
repo owner - changed status to closed
- Log in to comment
I tried it and got the following results locally:
It's a pretty good improvement. Are there any other flags I should try?