Haskell test unfairly is not compiled with optimizations

Issue #6 closed
Vassil Keremidchiev created an issue

at haskell/build.sh line 3 should be:

ghc -O2 Csv.hs

Comments (20)

  1. Ewan Higgs repo owner

    I tried it and got the following results locally:

    haskell,cassava,fieldcount,1.409
    haskell,cassava,fieldcount,1.284
    haskell,cassava,fieldcount,1.328
    haskell,cassava,fieldcount,1.301
    haskell,cassava,fieldcount,1.285
    haskell,cassava,fieldcount,1.309
    haskell,cassava,fieldcount,1.311
    haskell,cassava,fieldcount,1.306
    haskell,cassava,fieldcount,1.309
    haskell,cassava,fieldcount,1.302
    haskell,cassava,empty,0.012
    haskell,cassava,empty,0.012
    haskell,cassava,empty,0.022
    haskell,cassava,empty,0.013
    haskell,cassava,empty,0.013
    haskell,cassava,empty,0.028
    haskell,cassava,empty,0.014
    haskell,cassava,empty,0.014
    haskell,cassava,empty,0.021
    haskell,cassava,empty,0.015
    haskell,cassava-o2,fieldcount,1.256
    haskell,cassava-o2,fieldcount,1.161
    haskell,cassava-o2,fieldcount,1.162
    haskell,cassava-o2,fieldcount,1.164
    haskell,cassava-o2,fieldcount,1.155
    haskell,cassava-o2,fieldcount,1.171
    haskell,cassava-o2,fieldcount,1.178
    haskell,cassava-o2,fieldcount,1.176
    haskell,cassava-o2,fieldcount,1.183
    haskell,cassava-o2,fieldcount,1.162
    haskell,cassava-o2,empty,0.011
    haskell,cassava-o2,empty,0.010
    haskell,cassava-o2,empty,0.021
    haskell,cassava-o2,empty,0.014
    haskell,cassava-o2,empty,0.018
    haskell,cassava-o2,empty,0.023
    haskell,cassava-o2,empty,0.018
    haskell,cassava-o2,empty,0.010
    haskell,cassava-o2,empty,0.031
    haskell,cassava-o2,empty,0.016
    

    It's a pretty good improvement. Are there any other flags I should try?

  2. Vassil Keremidchiev reporter

    Kind of better. I will try to profile the code to find more places for improvement soon. Could you publish updated results for Haskell? Where I could download test .csv file?

  3. Ewan Higgs repo owner

    Yes I already did that the other day. You should be able to follow along on wercker if you want to see the history of runs.

    I'm happy for you to take a look at the Haskell code. It's the one that I struggled most to write and got @itkovian to help.

    Link to wercker runs: https://app.wercker.com/ehiggs/csv-game/runs

    NB: I also ran it locally using cabal install -O2 --reinstall Cassava but the difference was negligable. FWIW, docker image I made uses libghc-cassava-dev (which I guess was last build here if you're interested to see what flags are used.)

  4. Ewan Higgs repo owner

    I'll leave this with you @varosi. As it's currently a 'best effort' approach, if I don't hear back by, e.g. June, I'll close this. Does this work for you?

    Thanks

  5. Vassil Keremidchiev reporter

    yes, alright! Thanks! I'm using Windows and there Docker is kind of more complicated. I don't have Docker setup under Windows. That's why I was asking for the file only.

  6. Vassil Keremidchiev reporter

    One thing you could try more is to change "1024" to "16768" at line 22 (haskell/Csv.hs) which will make chunks little bigger, so less time is spent on I/O operations. It gave some acceleration on my side.

  7. Vassil Keremidchiev reporter

    Another optimization which bring faster Haskell code is:

    1. Remove from haskell/Csv.hs line 6: import Data.Text
    2. Change from haskell/Csv.hs line 23 to: loop 0 (decode NoHeader :: Parser [()])

    This brought few hundred miliseconds down.

  8. Vassil Keremidchiev reporter

    If you have some time you could experiment with multiples by 1024, like 4096 or 8192 on Linux. I'm trying here on Windows 10 x64.

  9. Ewan Higgs repo owner

    4096 seems to work best on my local machine.

    I feel that if you're going about your day and suddenly need to parse a csv file you shouldn't have to worry too much about buffer size. And if it gives the wild variance as I see in ghc's runtime when I change the buffer size, it just tells me exactly the kind of information I wanted to see from this game. ;)

  10. Vassil Keremidchiev reporter

    Could you update new Haskell results on the site? I'll try to see what else could be done using Cassava library.

  11. Vassil Keremidchiev reporter

    Here is a simpler suggestions (sorry that I have overloaded you with comments!):

    {-# LANGUAGE BangPatterns, ScopedTypeVariables, NoMonomorphismRestriction #-}
     -- Using Cassava from: https://hackage.haskell.org/package/cassava
    import Control.Monad
    import qualified Data.ByteString as B
    import Data.Csv.Incremental
    import System.Exit
    import System.IO
    import System.Environment (getArgs)
    import qualified Data.List as DL
    
    main :: IO ()
    main = do
        [input] <- getArgs
        withFile input ReadMode $ \ csvFile -> do
            let loop !acc (Many rs k)   = loop (acc + countFields rs) =<< feed k
                loop !acc (Done rs)     = print (countFields rs + acc)
                loop !_ (Fail _ errMsg) = putStrLn errMsg >> exitFailure
    
                feed k = k <$> B.hGetSome csvFile 4096
            loop 0 (decode NoHeader :: Parser [()])
          where
            countFields = sum . map length . \x -> [a | Right a <- x]
    

    You could change to 1024 buffer if you like. And it expects first parameter of the program to be file path as other tests. That way is more multi-platform - it runs on my Windows, too.

  12. Vassil Keremidchiev reporter

    Another question is - do you have LLVM on your testing machine?

    I think that results could be different if you pass to GHC this option: -fllvm Which uses LLVM as a compiler backend which should speed up Haskell even more. But the GHC depending on its version uses different LLVM. For example GHC 8.0.2 use LLVM 3.7.

  13. Log in to comment