eats about 227 times more memory than the size of the input file

Issue #37 new
Anonymous created an issue

Hi,

I'm running Debian unstable with pyyaml 3.11 which is compiled with libyaml-dev and I have libyaml-0-2 version 0.1.6-2 installed.

I observed, that loading yaml files can require insanely large amounts of memory compared to the original input file. Parsing a 11 MB yaml file needs 2.5 GB of memory for example. This is 227 times the size of the input file.

This seems more than just a bit excessive.

Is there a way to fix this? Or to work around this?

Comments (3)

  1. josch1337

    The memory requirement seems to grow linearly with the input size.

    Suppose the following file log.yaml and a benchmark like the following:

    for j in `seq 1 40`; do
        ( echo report:; for i in `seq 1 $j`; do cat log.yaml; done; ) \
        | /usr/bin/time -v python -c "import yaml,sys;d=yaml.load(sys.stdin)" 2>&1 \
        | awk '/Maximum resident set size/{print $6}'
    done
    

    Then I can create the following table mapping input yaml size (first column) to memory requirement in kbytes (second column)

    950753 219860
    1901498 435928
    2852243 657508
    3802988 866224
    4753733 1097896
    5704478 1309468
    6655223 1518492
    7605968 1727384
    8556713 1930316
    9507458 2190640
    10458203 2401940
    11408948 2613232
    12359693 2822744
    13310438 3031620
    14261183 3240476
    15211928 3449616
    16162673 3644236
    17113418 3855204
    18064163 4066428
    19014908 4376012
    19965653 4586856
    20916398 4798324
    21867143 5010488
    22817888 5221220
    23768633 5431840
    24719378 5640780
    25670123 5849476
    26620868 6058136
    27571613 6267164
    28522358 6475732
    29473103 6685748
    30423848 6893648
    31374593 7080928
    32325338 7283424
    33276083 7494708
    34226828 7705672
    35177573 7916916
    36128318 8127400
    37079063 8535712
    38029808 8740160
    

    Which one can then plot:

    out.png

  2. josch1337

    And the libyaml based parser performs better but still requires 65 times the memory as the input file size.

    This means that with 16 GB system memory I cannot reasonably parse yaml files larger than 250 MB even with the C based parser.

  3. Log in to comment