Remove old ROOT and ASCII output from code base

Issue #192 resolved
Jochem Snuverink created an issue

The primary hits and primary loss histograms are not filled for ROOT and ASCII output. The AnalysisManager still creates those histograms in BeginOfRunAction, but these are never filled. Note that BDSOutputROOTEvent directly fills TH1D's. If the ones in AnalysisManager are not being used anymore, they should be removed.

The filling was removed in 7a19798.

Comments (26)

  1. Laurie Nevay

    Yes, our current users we're in contact with have now shifted to the new output format and analysis, which has been the default in the develop branch for some time.

    I propose we remove this from develop in time for the next release. This would likely remove the analysis manager and histogram classes. If we require other output formats, we would shift to writing translating from the BDSOutputROOTEvent structures.

    This would significantly revise the output hierarchy. It would simplify some the required virtual methods in BDSOutputROOTEvent. Lastly, we can also reduce the name a bit of it if the others were removed.

  2. Laurie Nevay

    Thanks for doing this Jochem. I mostly agree with BDSOutputROOTEvent -> BDSOutput.

    I agree we should keep a base class for multiple formats - if nothing else for the 'none' output. We should revise the base class to cover the methods required by the root event output. Perhaps this is a change of taste for the base class not to have 'base' in the name anymore.

    So, perhaps we could change the base class BDSOutputBase -> BDSOutput and the derived class BDSOutputROOTEvent to BDSOutputROOT.

  3. Jochem Snuverink reporter

    So, perhaps we could change the base class BDSOutputBase -> BDSOutput and the derived class BDSOutputROOTEvent to BDSOutputROOT.

    Note that we still have 7 other classes with 'Base' in its name, but I don't have a preference either way. Agreed we can rather have BDSOutputROOT, that is probably better in case we ever get another output format.

  4. Laurie Nevay

    We also have a lot of output structure classes that inherit TObject (required for saving our own classes in a root file). These are all prefixed with BDSOutputROOTEvent. I propose we prefix these now with BDSOutput as although they use ROOT classes, they are more general output structures we will always fill with data and then translate to other sources.

    We have to be careful in renaming these though as we ruin backwards compatibility with previous data. Perhaps we can increment the class version from 1 to 2 and root might cope with this? @stewartboogert would be able to comment.

    Here's a list of the classes:

    • Options
    • Histograms
    • Info
    • Loss
    • Sampler<T> (float | double)
    • Trajectory
    • TrajectoryPoint

    I've included a snippet of the class hierarchy here for reference.

    Screen Shot 2017-07-03 at 13.57.52.png

  5. Jochem Snuverink reporter

    Ah good point about the backwards compatibility, we can of course also keep those names. Up to you.

  6. Laurie Nevay

    W.r.t. the use of 'base' in other class names, I have to distinguish between the abstract base class for a set of factories and the main interface for that set of factories. So far, I've named the interface without the base suffix and the abstract base class with the suffix. This pattern covers most of the use of Base. The interfaces could be specifically named and the base class renamed for consistency.

    Apart from that, there's optionsBase to distinguish from the parser options.

    All the new output analysis classes are done without the base suffix.

    I'm leaning towards BDSOutput as the base and BDSOutputROOT, which if I'm read right, you're ok with.

  7. Jochem Snuverink reporter

    As discussed I have renamed BDSOutputBase to BDSOutput and BDSOutputROOTEvent to BDSOutputROOT. The other BDSOutputROOTEvent classes I have left untouched and I leave those up to you and Stewart.

  8. Jochem Snuverink reporter

    @nevay: How shall we progress with this? Shall we merge the current branch 'oldOutputRemoval', and make a new branch for the renaming of the BDSOutputROOTEvent classes?

  9. Laurie Nevay

    I'll have a look over the branch today and review. We could do as you say and finish this branch and make a new one for renaming. I'll have a look over the current branch first and let you know today.

    @stewartboogert : can rename the storage classes and be backwards compatible?

  10. Laurie Nevay

    So, I've had a look over the branch and a few things remain to be decided and changed. We have a few hangovers from the old output still, but in design rather than say variables.

    This is a list of the calls to the output in order through the simulation (ignoring arguments):

    Begin of Run Action:

    Initialise
    

    End of Event Action:

    EventInfo
    WriteSamplerHits(plane)
    WriteSamplerHits(cylinder)
    EnergyLoss
    TunnelHits
    PrimaryHit
    PrimaryLoss
    Commit -> This writes bad run info and opens new file
    Trajectory
    FillEvent
    

    End of Run:

    WriteRunInfo
    CloseFile
    

    A few thoughts

    • Commit is perhaps badly named for what it does now
    • Commit isn't used at the end of a run or there's mixed functionality here I think
    • There are a lot of pure virtual interfaces that aren't so clear perhaps
    • We could use the template pattern in BDSOutput (now the base class)
    • We could use only a few methods to interact with the output and keep virtual (not pure) in the output for small bits of functionality

    As an example, we could do something like:

    Begin of Run:

    Initialise(const options&)
    

    End of Event: if (firstevent)

    FillEvent(info*, 
                   samplerhits*, 
                   samplerhits*, 
                   energyloss*, 
                   tunnelhits*, 
                   primaryhit*, 
                   primaryloss*, 
                   trajectories*)
    
    if (n > nperfile)
      {NewFile()}
    

    End of Run:

    FillRun(info*)
    CloseFile()
    
    • Commit would become "NewFile"
    • The addition of more output structures would require changing only one function in any output class

    • We could also move the code that builds up a unique filename with number suffix from BDSOutputROOT into the base class. Currently in (oldOutputRemoval branch) src/BDSOutputROOT.cc : line 174 - 193.

  11. Laurie Nevay

    A further consideration is that we would use the new root event output as a structure that any other format would translate and write out. Ie if we want to retain ASCII output, we would create a class that derives BDSOutputROOT and translates the structures. This can be done, but we should perhaps separate the structures from the pairing them with the ROOT file. The ROOT model is that you have a 'local' object that you link to a branch in a tree in a file and then call Fill() on. We could have a class that holds the structures (BDSOutputROOTEventHistograms, vector<BDSOutputROOTEventSampler> etc etc) and the actual ROOT output class that would link these to the file.

    This way, if we have an ASCII output class or some other format, we can still use and fill the structures (keeping the event action etc the same always) but have different ways to write out those structures.

    If we were to go down this road, we could have an intermediate class that could be used through inheritance for BDSOutput or aggregation (BDSOutput has a BDSOutputData for example). Nearly all of the constructor of BDSOutputROOT is preparation of the structures. Initialise has the linking to the file as well as the one off preparation and filling of options / model structures.

  12. Jochem Snuverink reporter

    Yes, this is probably a good moment to review the base class design altogether.

    • Agreed that Commit is badly named. Perhaps NewFile can still be confusing (not to be confused with Initialise), perhaps CloseandOpenNewFile ?
    • I also think it is a good idea to combine all in a single fillEvent call, that will greatly simplify the output class(es) and EndOfEventAction
    • Agreed that the building of the filename would rather belong in the base class.
    • And while we are at it, we can go for the template method pattern indeed. E.g. we could make all the current public virtual fill methods private and have the single public fillEvent non-virtual. I think that is very flexible and clean.
    • protectedNames should become private/protected. And perhaps the vector should rather be filled in the BDSOutputROOT class?

    I am not sure I fully understand or agree to the structure you propose for additional outputs, but let's think about it when it comes up, since as far as I know there are no plans in this direction at the moment?

  13. Laurie Nevay
    • I think CloseAndOpenNewFile is a good idea.
    • For the template pattern, would the virtual fill methods be protected so they can be overridden in a derived class if needs be - to specialise that segment without breaking the overall recipe?
    • protectedNames is intended to stop someone creating a sampler called 'Primary' or Eloss, which are names used for our default output structures. Just now (in oldOutputRemoval) we only have one output format, but I imagine if / when we have another format, we'll want to call them the say for similarity between formats. I think this is more general than BDSOutputROOT and should be in the base class for now. I put it there as the information is to do with output even though it's queried in BDSBeamline.cc : line 106.

    I think we're agreed on a set of changes so can proceed with those at least. Next comment about the other design issue.

  14. Laurie Nevay

    In the oldOutputRemoval branch, we only have one output format (that's implemented - none does nothing of course). The introduction of BDSOutputROOTEvent broke a lot of the structure / interfaces we had already to be format independent. Ie, it didn't use our histogram manager or histogram classes. The idea was that if we wanted another format we would prepare the root event output and translate it into another. So for ASCII, we would fill BDSOutputROOTEventHistograms as normal (as done in BDSOutputROOTEvent / BDSOutputROOT (now), and then translate them into ASCII (looping over bins etc).

    Although there's no immediate need for ASCII, I think we should not limit ourselves to a single output format. Indeed, we regularly encounter a lot of trouble with ROOT.

    If I wanted to create an ASCII output based on BDSOutputROOT (new version), I could derive that class, but I would still create a ROOT output file at the same time. The code just now must create a ROOT output file to work.

    I propose to separate the holding structures we accumulate for an event separate from the ROOT file mechanics. This way a different output format could use the same structures (see constructor of new BDSOutputROOT) but write them to a different file format.

  15. Jochem Snuverink reporter

    For the template pattern, would the virtual fill methods be protected so they can be overridden in a derived class if needs be - to specialise that segment without breaking the overall recipe?

    The virtual fill methods can be private and still be overridden (virtual). They will only need to be called inside the base class (private).

    Yes, I understand the mechanism behind protectedNames. I don't want to change the location of protectedNames, but I was thinking that protectedNames can be filled in the constructor of each derived class, so for each derived class individually. Now we fill it with Primary etc. in the base class which are really coming from BDSOutputROOT

  16. Jochem Snuverink reporter

    For a new output format it depends perhaps a bit on the goal and use case.

    The most natural simplest way for a new output format would be, I think, the mechanism we used to have for ASCII by deriving from the base class. In there the holding structures are already separated from ROOT, and if we encounter trouble with ROOT we would be really independent from it.

    But I don't think we need to decide right now on such a mechanism and for now we can keep it as simple as it is.

  17. Laurie Nevay

    The goal would be to have another output file format with the same data. Just now the goal is not to preclude this through design.

    Yes, this was the old design, but this was broken with the introduction of ROOTEvent format. Originally we had our own histograms etc and all information was prepared in BDSEventAction::EndOfEvent(). If all information is prepared in EndOfEvent, then it's easy to give to different formats. ROOTEvent prepared the histograms inside the derived output class so this is no longer possible. I made quite a point of this early last year but we proceeded anyway as an event based structure was required and it was already done. The compromise was that we would eventually use these structures for other output formats. We now have a dependency on ROOT so we can safely use the histogram classes etc even if we don't intend to put them in a ROOT file.

  18. Jochem Snuverink reporter

    Just now the goal is not to preclude this through design.

    In this discussion the current ROOTEvent design hasn't changed in a big way, the base and derived class BDSOutputROOT.cc will stay largely the same. The design can always be changed later when ever needed. Such a change would be easier than before since quite a few things have been cleaned and there are few classes involved.

    But agreed if one wants to have exactly the same info in a different format, splitting the accumulation and file mechanics makes sense. In a way this is again having an analysis manager this time based on ROOT. In that case one could split off the holding structures into a single analysis manager and fill the different format files in EndOfRunAction.

  19. Log in to comment