Segmentation fault in G4Trajectory destructor when using "optical" physics.

Issue #186 resolved
Lawrence Deacon created an issue

I’m having trouble with a bug when using scintillation with the optical physics list enabled. Initially I thought this might be due to the scintillator screen geometry in the AWAKE example. However, the bug still occurs when I just make some of the beam pipe material into a scintillator, as I have done in branch debug-2, in BDSSpectrVacChamb. If you run the awake example with optical physics list enabled in this branch you will see an exception occurs at the end of the first event:

BDSIM is about to crash or was interrupted! 
With signal: Segmentation fault: 11
Terminate run - trying to write and close output file
Ave, Imperator, morituri te salutant!
WARNING - Attempt to delete the physical volume store while geometry closed !
WARNING - Attempt to delete the logical volume store while geometry closed !
WARNING - Attempt to delete the solid store while geometry closed !
WARNING - Attempt to delete the region store while geometry closed !

Process finished with exit code 1

This happens after the EndOfEventAction in the destructor of G4Trajectory (via the destructor of BDSTrajectory) but I do not understand why it is happening. It only happens when the “optical” physics list is enabled (I am using “optical” and “em”).

Comments (16)

  1. Laurie Nevay

    So one problem is in end of event action, there isn't a trajectory container when the event is aborted - I can add a check for this. I'm pretty sure this is what causes the segfault.

    The event is aborted because an optical process is called not on a boundary some how. This links to the other issue we saw with Geant4.10.3 which also showed the same error with the optical physics. When I check on the validity of the trajCont pointer it runs ok and it completes without segfaulting but with errors.

  2. Jochem Snuverink

    Agreed with Laurie that a check on the trajectory container is needed.

    When I run in debug I see a huge number of photons. The last bit of printout I get is:

    #!
    
    BDSTrackingAction::PreUserTrackingAction>  TrackID=168812 ParentID=168703
    BDSTrackingAction::PreUserTrackingAction>  TrackID=168856 ParentID=168705
    BDSTrackingAction::PreUserTrackingAction>  TrackID=168855 ParentID=168732
    BDSTrackingAction::PreUserTrackingAction>  TrackID=168854 ParentID=168733
    BDSTrackingAction::PreUserTrackingAction>  TrackID=168851 ParentID=168836
    StackingAction: New stage
    BDSEventAction::EndOfEventAction> processing end of event action
    BDSEventAction::WritePrimaryVertex> 
    BDSOutputROOTEvent::WritePrimary> 
    BDSEventAction::EndOfEventAction> processing sampler hits collection
    BDSOutputROOTEvent::WriteHits> 
    BDSOutputROOTEvent::WriteHits> 383
    BDSEventAction::EndOfEventAction> processing cylinder hits collection
    BDSEventAction::EndOfEventAction> filling histograms & writing energy loss hits
    BDSOutputROOTEvent::WriteEnergyLoss> 
    BDSOutputROOTEvent::WriteTunnelHits> 
    Interaction point found at 13.387352000066 m - fElectromagnetic : fIonisation
    Interaction point found at 13.387352000066 m - fElectromagnetic : fIonisation
    BDSOutputROOTEvent::WritePrimaryHit> 
    BDSOutputROOTEvent::WritePrimaryLoss> 
    BDSEventAction::EndOfEventAction> finished writing energy loss
    BDSOutputROOTEvent::FillEvent> 
    BDSEventAction::EndOfEventAction> end of event action done
    

    Not sure if 150k tracks will be handled correctly memory-wise (it should be though!)

  3. Laurie Nevay

    I've added the check in Lawrence's 'debug-2' branch.

    I then see the invalid exit normal warning / optical boundary process in middle of volume and the event is aborted.

  4. Laurie Nevay

    Stewart has tested the trajectory output with 50TeV hadronic showers with ~500k tracks without problem. I'm not sure of the memory usage though.

  5. Laurie Nevay

    So, I can avoid the optical process boundary error by disabling the sampler parallel world physics process (See screenshot edit of bdsim.cc at line 161.)

    The first event runs for around 5 minutes on my laptop and then crashes with a bad access, again in the trajectory destructor.

    Screen Shot 2017-03-07 at 17.16.21.png

    Screen Shot 2017-03-07 at 17.18.06.png

    Screen Shot 2017-03-07 at 17.17.25.png

  6. Laurie Nevay

    There are no samplers constructed in the parallel sampler world though so this shouldn't affect it. We always register the sampler parallel world and process even if there aren't any samplers (hard to know at that stage whether there will be any). But I think this is related to the other error so I'll leave that for now.

  7. Laurie Nevay

    I think I may have solved this now. We override the method MergeTrajectory in BDSTrajectory but simply call the base class method. This copies points from one trajectory to another (via allocation) and deletes old ones. The base class (G4Trajectory) deletes the points in its container by their base class pointer (G4TrajectoryPoint) in their own container.

    Replacing the call the base class MergeTrajectory and implementing it in our class as per the geant4 example

     extended/optical/wls/src/WLSTrajectory.cc
    

    seems to fix this problem.

    This only seems to happen with optical physics.

    The awake spectrometer example runs for an extremely long time per event (upwards of 10 minutes). For (mostly) testing purposes, I introduced two options in the parser to shorten this.

    option, maximumPhotonsPerStep = 1;
    option, maximumTracksPerEvent = 1e3;
    

    The first is a simple interface to the function of the same name in G4OpticalPhysics that controls the maximum number of Cherenkov photons that can be produced per step. This doesn't have too much affect on the duration. The second is a relatively dangerous artificial cut on the number of tracks. When the track ID exceeds this number, all tracks are killed irrespective of any other cuts. This definitely does not conserve energy. Useful for testing.

    The parser/published class was extended to handle the long type.

    Note G4Track::GetTrackID() returns a G4int which is a typedef to int. If unsigned, we would expect the maximum number to be 65535. However, this can easily exceed 1 million. In this awake example, a single event can produce tens of millions of tracks without limitation. These numbers and the type of int seem at odds. I used long anyway as it's what should be used. The default is 0, which takes no action at all.

    Another change was to allocate the trajectory points vector on to the heap. Although the contents of the vector are on the heap the vector object itself is on the stack. Since we're allocating millions of even the smallest object, we should use the heap. This also follows the way the Geant4 examples do it.

    These changes are in the debug-2 branch. Please have a go

    I'd like to merge this branch back into develop, but I'll let you (Lawrence) decide which changes for the screens and awake spectrometer you'd like to keep.

    Note, since I've changed the option, you should do make install if using the libs in ROOT to load the data.

  8. Laurie Nevay

    Limiting the maximum number of tracks to 1e3 as above limits the event duration to about 10s. Even with this limit, the fault was evident before with optical physics enabled. It happened with only a very low number of tracks / trajectories. These settings are of course not suitable for studies.

  9. Jochem Snuverink

    great that it is solved. Just as a side note, although an unsigned int is indeed guaranteed maximally 65535 (2^16 -1). However on practically all 32 and 64 bit machines the maximum is 2^32 -1 = about 4 billion (https://en.wikipedia.org/wiki/64-bit_computing#64-bit_data_models). This can be checked by checking the value of UINT_MAX. So although we should theoretically be careful, in practice there is absolutely no problem there and we will easily run out of memory before we hit 4 billion tracks.

  10. Jochem Snuverink

    Instead of merging into develop, which I presume was not the intention of the commits by Lawrence, we could also cherry-pick the commits by Laurie into develop?

  11. Laurie Nevay

    I think it's probably easier and cleaner to undo those particular edits in the debug-2 branch and merge it into develop now. We can easily undo them in source tree by viewing the commit and clicking 'reverse hunk' in the viewer.

    But as you please - either way.

  12. Jochem Snuverink

    Up to you. I would say the opposite that cherry-picking is easier and cleaner (I count only 6 commits), but it depends if one keeps the commits of @ldeacon.

    By the way I thought a bit about the addition of long to the parser. It should work in the current implementation (since GMAD works in doubles), but I don't see the advantage in this case since, as you noted, GetTrackID() returns an int anyway. But we can keep it.

  13. Log in to comment