@brittonsmith is on vacation but we may want to wait to hear from him to review this.
Worth noting that out_file.attrs["unit_registry"] = self.data_ds.unit_registry.to_json() overwrites whatever is in that attr at the time. This effectively keeps the unit registry of the halo catalog in sync with whatever the unit registry of the Dataset it came from is, but I'm not sure if overwriting the unit_registry attr (and therefore the unit registry of the catalog) on save/load is good default behavior. I suppose someone can only define and use a custom unit registry for the halo catalog once it's loaded in memory, but it will be lost on save in favor of the Dataset unit registry. So maybe the way to make persistent changes to the halo catalog unit registry with this code is to make it to the parent Dataset. Is this a feature or a bug?
Is there ever a case when the halo catalog dataset should have different units than the dataset it was created from?
Not sure -- that's a good point. I think this is good default behavior, and anyone who wants to deviate can do it explicitly when they run their analyses.
This shouldn't be a problem since the unit registry was not being saved before at all. This file is being opened in "w" mode, so it's already clobbering anything that was there.
Hi @rasmii, this will be a nice improvement over the current system that saves everything to CGS and assumes it is that way on load. As I mentioned in the email thread, all fields are converted to CGS before saving, so for this to be correct, that will also need to be changed as will the declaration of CGS as the incoming unit. To generalizing the field units, have a look at yt/frontends/halo_catalog/fields.py and the values of m_units, p_units, etc. I think you'll need to change those to be code_mass, code_length, etc. For the saving, you'll probably want to save to whatever the base unit is. This is happening in yt/analysis_modules/halo_analysis/halo_catalog.py in the _run function.
Additionally, the units are saved as attributes to the individual datasets, so it may be even safer to make use of those attributes when the data is read in. I believe I have done this before. I will check what I did and get back to you.
Also, if you don't feel like falling down this rabbit hole, the current units should be correct (in CGS), just not necessarily the same as the original dataset's units.
For what it's worth, I have checked and found that the halo_catalog frontend already reads in the units attribute saved with the field data, so the units of the fields are not being inferred, they are being read directly from the file.
@brittonsmith is this PR and the changes described below unnecessary? If yt already handles halo catalog units properly, I might just look at other solutions for getting around the units issue I was facing.
@rasmii, I don't think it is explicitly necessary, but it would be a nice addition if you ever want to take it up again. I've lost the thread on this, but if you're still having issues with the HaloCatalog units, I'd be glad to help out with that.
Hi @rasmii, this functionality has now been implemented in PR 2482. The unit registry is now saved and reloaded with HaloCatalog datasets, so you can now even use things like code_length. We'll decline this now since that PR has been merged.