Issue #2 invalid

Heap exhausted on large archives

tymmym
created an issue

I want to take first five bytes from the fist file in zip archive:

{{{ import Data.Word (Word8) import System.Environment (getArgs)

import Codec.Archive.LibZip

main = do f:_ <- getArgs bytes <- withArchive [] f $ fileContentsIx [] 0 print $ take 5 (bytes :: [Word8]) }}}

This code works for small archives but I got heap overflow with big ones. For example:

{{{ ghc -O2 --make -fforce-recomp -rtsopts -prof -auto-all -caf-all libzip-new.hs ./libzip-new test.zip +RTS -p -hy -M300M }}}

for this archive[1] gives:

{{{ Heap exhausted; Current maximum heap size is 314572800 bytes (300 MB); use `+RTS -M<size>' to increase it. }}}

LibZip-0.10, bindings-libzip-0.10, ghc 7.0.4, libzip 0.10.

GC statistics, heap and time/allocation profiles attached.

  1. http://dl.dropbox.com/u/52911301/test.zip

Comments (3)

  1. Sergey Astanin repo owner

    LibZip is strict in its operations, and fileContents/fileContentsIx read entire file entry in memory (in your case there is a huge file in the archive). According to documentation,

    > Partial reading of the files in the archive may be performed from within Entry monad (see fromFile).

    You should use fromFile/fromFileIx and readBytes instead if you know you want to read the file only partially.

    Your read5.hs becomes:

    import Data.Word (Word8)
    import System.Environment (getArgs)
    
    import Codec.Archive.LibZip
    
    main = do
      f:_ <- getArgs
      bytes <- withArchive [] f $ fromFileIx [] 0 $ readBytes 5
      print $ (bytes :: [Word8])
    

    It works normally:

    $ ./read5 test.zip 
    [44,43,227,70,149]
    
    
  2. Log in to comment