This is a major change to how octrees in the code function. It may not be ready for immediate inclusion.
Note that this includes much or all of Doug's removal of shape/size. (If this PR gets accepted, his will be closed automatically.)
The major changes:
Oct leaves are now 256 bits. I think this will drop to 224 in the future, and possibly 192 if we can drop domain_ind.
RAMSES now has one oct_handler per domain, which includes all of the octs for that domain and their ghost octs. This will, in the future, handle ghost zone reading. This uses tsearch which is a binary tree index of a key/node pair for each root mesh oct. This will be possible to apply to distributed memory octrees. (RAMSES currently builds the entire set of octrees on each domain, as do the particle frontends, but this too will be changeable.)
Everything is done via recursive descent through octs. The specific functions that are called are inside oct_visitor.pyx and can be supplied to (subclass-able) visit_all_oct functions.
Selectors no longer implement select_octs and a full mask of the Oct cells is never created.
Particle datasets are now indexed through a two-step Octree construction process. The first step reads particles from disk and converts them to Z/Morton-ordered indices. Once all particles have been read from disk, they are sorted and streamed through and an Octree is created. This operation is much, much faster and considerably lower memory than the old system. Future work will parallelize this by utilizing the ring iterator and parallel sorting based on load-balancing estimates. That's a bit further off.
Lots of the particle control has been consolidated and touched up. I've also consolidated some of the Octree classes.
Particles are now dual-indexed. The first level is through ParticleRegions, which are a coarse method of identifying which regions of the domain touch which files on disk. The second is through a global spatial index. ParticleRegions are simple masks of size min(Nproc, 256)3, where a bit field is twiddled based on the processor. (This can get quite expensive, so at the very large file count we will need to think this through.) The total mask size in memory is 64bits * min(Nproc,256)3 * ceil(Nproc/64). This is to deal with unsorted particle datasets, where the indexing in-memory and on-disk are different. To load data from disk, particles regions are masked with the selector, then the particles from the corresponding files are loaded, then applied to the selector. In this way we will be able to deposit into an octree that fills a region.
I've seen gigantic speedups for particle datasets, moderate speedups for RAMSES, slight slowdown for NMSU-ART, and a huge reduction in memory across the boad. As an example, the AGORA RAMSES dataset went from using 3.4 gigs in RAM to using ~680 megs. The particle datasets are considerably faster (factor of a few) and extremely lower memory usage. I believe there is much more room to improve this.
My hope is to refactor the Octree descent somewhat to enable this to be used by ARTIO, where there are descent functions already.
Additional work will need to be made in parallel octree construction, pinning to processors, and ensuring reliability of this approach as we scale up. I do need some people to test this -- specifically, people who may have RAMSES datasets I don't have access to, people who have particle datasets that are still to be used, and other codes like Enzo and so on.
I've listed as reviewers people who may be interested in this, but I'd really welcome input from anyone else who sees this! Even if you don't grab the code and test it, suggestions on the code or the laid out algorithms above would be very helpful.
This is all inside the octdiet bookmark, so you can test with:
UPDATE 2: Fixed the tests, although I am getting odd failures in VorticityStretching, which seem very familiar to the older AbsDivV test failures.
UPDATE 3: Fixed the arbitrary_grid deposition functions.
UPDATE 4: A few more fixes, particularly for particle codes (and fixed a long-standing Gadget record reading problem). The remaining issues I want to address include Chris's comment about the segfault in ART, and the problem I have recently noticed of particle_position_x and so on being inconsistently available in the particle codes.
UPDATE 5: Fixed a broken test.
UPDATE 6: Fixed the problem @juxtaposicion reported. This was fun -- I forgot to pass the min/max level through the superselectors for Octree regions, and this now changes that. It also fixes a minor RAMSES issue with units and switches the logic in the superselector for OctreeSubsets to be slightly simpler. Chris, can you give this a shot? It'd also be very helpful if you could verify for me that the images/projections look correct for NMSU-ART -- they look good to me, but I could be wrong.
I think this might be ready for primetime. One lingering issue is the one noted above about Gadget, but I don't think that's related to this PR.
UPDATE 7: Several iterations later, all of the tests and all of my scripts pass. This PR screams out with the need for the particles-in-closures bug in #598. Any time we're trying to support multiple fluids or multiple particles in a given frontend, we will likely run into more issues. I'd like to try to address this as soon as possible, but this will need to be part of the units work. I have assigned the bug to myself, so once the field renaming and whatnot has settled down this will be the next step.
This PR is ready to go according to all of my tests, but I would appreciate a few more eyes on it just to make sure.
As a sidenote, I don't know why sfr_spectrum or hop_hop.c are different, but I assume that's from some un-PR'd changes by @juxtaposicion that I didn't realize I included. If there are other places that look funny like that, please let me know and I will either revert or change them.
That's bizarre. I don't think I've ever touched those files.
Ahh, looking at the commit log, that's Sam's doing :)
Oh, yeah eek -- I hacked a few things in the ARTIO fork that were never meant to be pulled upstream. Sorry about that.
Tested. Works as advertised!
Perhaps I spoke too soon. While both Density and Ones projections look OK, pf.h.find_max segfaults for me, giving:
GLOBAL INDEX RAN AHEAD. 74370246937671 0 2312557
OK, fixed with the latest commit. Projections look good still.
Tested on Ramses, Gadget, and Tipsy format; and the projections and profiles work fine. Thank you!
enzo: 4.470e+11 Msun within 1 Mpc (7.168e-01 GB added, took 5.025e+02 seconds)
gadget: 4.243e+11 Msun within 1 Mpc (7.969e-01 GB added, took 2.946e+01 seconds)
gasoline: 8.320e+10 Msun within 1 Mpc (8.369e-01 GB added, took 6.191e+01 seconds)
pkdgrav: 6.179e+11 Msun within 1 Mpc (2.107e+00 GB added, took 1.576e+02 seconds)
ramses: 4.303e+11 Msun within 1 Mpc (2.133e+00 GB added, took 4.478e+01 seconds)
Total memory usage: 6.651e+00 GB
enzo: 4.470e+11 Msun within 1 Mpc (3.064e+00 GB added, took 3.015e+02 seconds)
gadget: 4.241e+11 Msun within 1 Mpc (3.223e-01 GB added, took 1.903e+01 seconds)
gasoline: 8.320e+10 Msun within 1 Mpc (3.125e-02 GB added, took 3.705e+01 seconds)
pkdgrav: 6.179e+11 Msun within 1 Mpc (2.246e-01 GB added, took 9.172e+01 seconds)
ramses: 4.303e+11 Msun within 1 Mpc (2.109e-01 GB added, took 3.056e+01 seconds)
Total memory usage: 3.917e+00 GB
There is an enormous regression in the Enzo memory usage, which I'm going to look into, and Ji-hoon has just now emailed me that it seems ParticleRadiuskpc does not work for Enzo at the present time. Other than the Enzo change, I think all the numbers look very good.
These changes are really dramatic; nice job!
Gasoline needs just 30MB of memory -- but everything else is an order of magnitude more than that?
I'm not completely convinced this is totally accurate. I'm also not manually garbage collecting. I can dig down a bit later into why Gasoline takes up so much less ram, but ... it might just be incorrect.
I've attempted to reduce the Enzo memory usage, but I think this should be a separate endeavor. If this looks good, I think it should be ready to go. Then I can issue separate pull requests for a few other modifications, which really belong outside this PR.