MemSpeed: Re-use allocated memory

Roland Haas

removed comment

Maybe related to this. The MemSpeed test takes lots of memory, namely 64GB on bethe:

23714 rhaas     20   0   34.4g  32.0g  17016 R  47.4 12.7   3:09.85 cactus_sim
26084 rhaas     20   0   33.4g  32.0g  17308 R  42.1 12.7   1:39.39 cactus_sim

It really should use less memory (<200MB would be preferable) to avoid eg trashing when running on someones laptop or workstation.

Is the amount of memory used fixed or proportional to the memory available on the host (bethe has lots: 263867020)?

2018-08-02T15:52:31+00:00

Erik Schnetter reporter

removed comment

MemSpeed needs to allocate sufficiently much memory to ensure that the measurement is not handled by the L3 cache. By default, it allocates 1/4 of 1 NUMA node worth of memory. There is an option to allocate even more memory so that the inter-NUMA memory speed can also be measured.

If you run one or two instances of MemSpeed simultaneously, there should be no swapping. I do this regularly on my laptop.

It should be possible to limit this further to e.g. at most 10x the last-level cache size.

2018-08-04T13:06:34+00:00

Roland Haas

removed comment

That may be a good idea. The MemSpeed test on 2 MPI ranks on my workstation (96GB of memory, each MemSpeed ranks uses 12GB) is also quite slow, taking about 10 minutes to finish.

2018-08-09T08:57:34+00:00

Roland Haas

removed comment

I have made a simple attempt at this and limited the amount of memory used (per rank) to 1GB which should be much larger than the first level of cache for a while still. The largest cache like memory (see https://en.wikipedia.org/wiki/CPU_cache#MULTILEVEL) would be the eDRAM on Haswell CPUs with integrated graphics which is apparently 128 MB.

Note that this renders the skip_largemem_benchmarks options somewhat redundant, though not fully since skip_largemem_benchmarks would trigger on nodes with less than 4GB of memory per MPI rank used. It does become redundant on "typical" clusters though since we use only a small number of MPI ranks per node.

The test still takes approximately 2min on my workstation, though at least 1min of that is not the main memory test. This long runtime is probably a measure of how slow the CPUs in my workstation are is by now (certainly compared to the amount of memory in it).

2019-01-23T10:35:39+00:00

Roland Haas

changed status to open
removed comment

I will push the change in the attachment sometime after Feb 1st 2019.

2019-01-23T10:36:46+00:00

Roland Haas

I made a pull request. https://bitbucket.org/cactuscode/cactusutils/pull-requests/18/memspeed-use-at-most-1gb-of-memory-per-mpi/diff which @SamuelCupp approved.

Committed as git hash bf5ea49 "MemSpeed: use at most 1GB of memory per MPI rank for tests" CactusUtils.

I am leaving this ticket open b/c the issue of re-using memory is still present.

2019-02-14T16:49:54+00:00

Comments (6)