Excessive memory usage by samplers for ROOT output

Issue #107 resolved
Laurie Nevay created an issue

The LHC model with full aperture information (~14k beam line elements with non circular aperture) uses approximately 280Mb of memory with no samples. Attaching samplers to every item results in a very large amount of memory per sampler being used. This is also different between mac and unix.

The per event memory usage is greater than the output file size for around 1000 events (obviously output has compression so not one to one, and can use more memory in the at run time but still excessive).

In both cases, one primary proton is tracked / event, secondaries are turned off and each sampler in the end only re

LHC with around 14k samplers On mac: ~3.3Gb On unix: ~5 - 6Gb (more numbers to follow - from only running 3 jobs per 16Gb node)

This equates to 300 - 600Mb / sampler at run time, which is excessive and not scalable. Certainly, with only one primary hit in each sampler.

It's typically necessary to run ~10000 particles for accurate optics comparison and so I typically do this on the farm, but the memory consumption is proving limiting.

Comments (5)

  1. Jochem Snuverink

    Just to add where I think the large memory consumption could come from. By the way, you probably meant 300-600kB per sampler (3-6GB / 14k).

    One sampler entry stores about 500 bytes of (uncompressed) data. With 1000 events this indeed amounts to the 500kB range.

    So I don't think there is a realistic way to reduce the memory consumption directly. What instead we can do is write out more often to the root file. Either by creating more root output files (e.g. --nperfile=100 for every 100 events), or by making an option that clears the memory and writes to the same root file (I assume that there is an option for this in ROOT, but I haven't checked).

  2. Laurie Nevay reporter

    Agreed on the 300 - 600kb.

    However, this is not for 1000 events - this is per event. There isn't a memory leak in that it accumulates as events progress - the memory usage is constant. So there is a spare order of magnitude there.

    I had thought about splitting the root output into two classes - one that has the extra information such as production point and point of last scatter. This would at least half the stored data. Although this wouldn't reduce the hits collection size which will likely be the main source of this.

    So all in all, not unexpected given number of samplers but we need to be a little more judicious with them.

  3. Jochem Snuverink

    Just to put here what we discussed. ROOT allocates already some of the memory before the fill. On https://root.cern.ch/doc/master/classTTree.html , it says that for every branch a buffer of 32000 bytes is allocated (default value). We create about 45 branches per sampler. With 14k samplers this will amount to about 20GB (doesn't correspond so maybe there is more to it).

    Reducing the number of samplers with a lite version is a good idea and will hopefully reduce by quite a bit. Other options are to create less branches (just one branch per each BDSParticle in BDSSamplerHit), or set the default buffer size to a lower value.

  4. Laurie Nevay reporter

    The root output has been split into "root" and "rootdetailed".

    For the LHC optics as an example:

    100 particles, 1 turn, no secondaries, core Gaussian beam (optics configuration), sampler on everything ~14000 samplers

    (new) "root" 1.4Gb RAM constant during run (280Mb model only, rest samplers), file size 82Mb "rootdetailed" 3.3Gb RAM constant during run, file size 248Mb.

  5. Log in to comment