some Dataset.apply_*() functions don't work inplace

Issue #20 new
Thomas Gilgenast created an issue

the apply_*() functions are quite powerful and they are the preferred way to write new data into the Dataset as a function of the existing data

however, some of the apply_*() functions (I think it's the ones that loop over regions) give incorrect output when the input column and the output column are the same (presumably because before the loop begins the column is overwritten with zeros or a suitable base value)

to solve this, these particular functions should check each output column name to make sure it is not also an input - if it is, the output column should be renamed to '<column_name>_temp', then the function should be applied in the loop, and finally we should resolve the differences with

self.df['<column_name>'] = self.df['<column_name>_temp']
del self.df['<column_name>_temp']

Comments (3)

  1. Thomas Gilgenast reporter

    this is unlikely to get a fix because lib5c.structures.dataset.Dataset is likely to be superceded by a new data structure based on the hic3defdr data layout

    this issue is still helpful for the discussion of the features we would like to see in that new data structure

  2. Log in to comment