#247 Merged at bd0ac35
Repository
yt-dev
Branch
yt
Repository
yt
Branch
yt
Author
  1. Christopher Moody
Reviewers
Description

Cleaned up the ART frontend

Added **kwargs to timeseries.from_filenames, these args get passed to load()

Added a bit of debugging info to the Sunrise exporter

Comments (22)

  1. Sam Skillman

    Looks like there are some indentation issues in camera.py around line 946, otherwise the auto-focus stereo camera looks good.

  2. Christopher Moody author

    I fixed the indentation -- was that the conflict?

    So here's a dumb question, but how did you guys see the conflict/bug but I didn't? In other words, how do I not submit a crappy PR again?

    1. Nathan Goldbaum

      It looks like a bad merge. Mercurial will annotate a merged file when you need to manually resolve a conflict. If you set up a merge tool in your hgrc file you can take care of this automatically before you commit the merge.

      I like the emacs emerge tool (info on how to set it up: http://mercurial.selenic.com/wiki/MergingWithEmacs) but if you use vim or some other text editor I'm sure there are equivalent solutions that are just as good.

        1. MattT

          The conflicts are being displayed by Bitbucket; he has not committed the merge markers. This just means that the conflict and merge will have to be done by hand.

  3. MattT

    I'll review these changes in detail either today or tonight and provide feedback, Chris -- thanks for submitting!

  4. MattT

    Hi Chris, I've reviewed the changes. I think we need another round before they can be accepted. Thanks again for submitting them -- there are a lot of really great things in here, but I think there needs to be a bit of refinement.

    Comments:

    • Changing RockstarHaloFinder to use time series is good, but it's too bad this is the only halo finder that uses it. Because of that, could you please add a docstring (in the format of doc/docstring_example.rst )? Thank you, however, for casting to TimeSeries -- that's hugely helpful!
    • Please don't use print. This should be either mylog calls or removed.
    • Calling dd["particle_type"] will load the entire dataset into memory. So, if this is run in parallel, the entire dataset will be read on every single processor. Check the other halo finders for how to do this efficiently. (L63)
    • Why have all the workgroup creation calls been removed? Are we just running on a single processor now, or a single group? No more IO/compute separation? This should be fine, since IO happens before the computation starts, but it will result in memory over-allocation at startup.
    • Please perform a merge with the tip, resolve the conflicts, and reissue the PR so we can see how these should be resolved.
    • Where does pf come from within rh_read_particles? Is it now a global, too?
    • Is SCALE_NOW only applicable to simulations where the size of the box is comoving, or proper? Is this a generally useful item?
    • My biggest concern with these changes is the number of attributes that are now appended onto grid objects. (L98 in the grid object.) It seems to me that we should not be adding all of these. They should either be directly added to the field_data attribute, or they should be held inside a secondary dictionary -- for instance, a .particle_fields dictionary. This would also remove all of the if/elif code blocks later on. The same goes for all of the pf.star_position_x etc etc attributes. Also, what is the point of do_particles? I don't think I udnerstand the difference between the two options in that conditional.
    • There's another print in the data_structures.py file.
    • Why does the convert function for temperature return 1.0, when the conversion gets applied earlier?
    • The whole io.py function for read_particle_field should be changed to use dict accesses on the grid objects.
    1. Christopher Moody author

      Hi Matt, I'm responding to each of these comments: Changing RockstarHaloFinder to use time series is good, but it's too bad this is the only halo finder that uses it. Because of that, could you please add a docstring (in the format of doc/docstring_example.rst )? Thank you, however, for casting to TimeSeries -- that's hugely helpful! OK. Please don't use print. This should be either mylog calls or removed. OK. Calling dd["particle_type"] will load the entire dataset into memory. So, if this is run in parallel, the entire dataset will be read on every single processor. Check the other halo finders for how to do this efficiently. (L63) OK - this may take a bit more time.

      Why have all the workgroup creation calls been removed? Are we just running on a single processor now, or a single group? No more IO/compute separation? This should be fine, since IO happens before the computation starts, but it will result in memory over-allocation at startup. This one will also take more time.

      Please perform a merge with the tip, resolve the conflicts, and reissue the PR so we can see how these should be resolved. Where does pf come from within rh_read_particles? Is it now a global, too? pf is not global, but the TimeSeries object inside the Rockstar handler is. pf just pulls the next object off the queue.

      Is SCALE_NOW only applicable to simulations where the size of the box is comoving, or proper? Is this a generally useful item? SCALE_NOW is always necessary, since it's used for tying snapshots together, and figuring out a coarse merger tree (before consistent-trees is used.) It travels with the data that is written to hard drive, and I think is a necessary parameter, even if it's not necessary to compute (say) comoving distance from proper.

      My biggest concern with these changes is the number of attributes that are now appended onto grid objects. (L98 in the grid object.) It seems to me that we should not be adding all of these. They should either be directly added to the field_data attribute, or they should be held inside a secondary dictionary -- for instance, a .particle_fields dictionary. This would also remove all of the if/elif code blocks later on. The same goes for all of the pf.star_position_x etc etc attributes. Also, what is the point of do_particles? I don't think I udnerstand the difference between the two options in that conditional. One loads in particles, so that can be accessed as an array attached to pf or as particles attached to the root grid cell. The second options allows for further refinement, attaching to grids and subgrids.

      There's another print in the data_structures.py file. Removed or swapped with mylog.

      Why does the convert function for temperature return 1.0, when the conversion gets applied earlier? It's left over cruft from trying to convert the temp to Kelvin. Removed.

      The whole io.py function for read_particle_field should be changed to use dict accesses on the grid objects. OK.

      1. MattT

        Hi Chris,

        Cool. I'm satisfied with your replies, and I'll accept once the grid/particle stuff uses dicts. Thanks!

        -Matt

  5. Christopher Moody author

    OK, I've updated with the easy stuff to fix: I've moved IO from a stupid series of if-statements to a dictionary. The ART frontend changes should be OK. From here on out, I'm focusing ART dev on yt-3.0.

    However this still leaves the issue of not loading every particle on every processor, which will take a bit more time for me to figure out.

    1. MattT

      I think you should consider the ART frontend completed for 2.x, and we can now just focus on Rockstar and accept this changeset.

    2. Stephen Skory

      What do you mean by "not loading every particle on every processor"? Do you mean that every processor is loading every particle, or that you're not loading some particles?

      1. MattT

        Yup, right now each processor loads all the particles. I think this is fine since we're going to essentially freeze this frontend in time and focus on 3.0.

        1. Christopher Moody author

          Matt wrote a comment below on a line yt/analysis_modules/halo_finding/rockstar/rockstar.py:115.

          Rockstar needs to know the total number of particles, but doesn't need to actually load every particles on every processor, which is what the dd['particle_type'] call is essentially doing it. Matt's suggested I replace this with a derived field, which will deal with this in a memory- and cpu- light way.

  6. MattT

    Hi Chris. Can you revert your rockstar changes to the tip of the yt branch, then re-submit this PR?

      1. Christopher Moody author

        Yup... already working on it. I wanted to include some of the cleaning I did in yt-3 to this branch, and then minimize updates to the ART yt-2.4/2.5 frontend.

  7. MattT

    Hi Chris -- these changes look alright to me, since they're confined to ART and Sunrise. Thanks for fixing up the ART frontend -- see you in 3.0!