Segmentation fault in G4Trajectory destructor when using "optical" physics.

Issue #186 resolved

Lawrence Deacon created an issue 2017-03-07

I’m having trouble with a bug when using scintillation with the optical physics list enabled. Initially I thought this might be due to the scintillator screen geometry in the AWAKE example. However, the bug still occurs when I just make some of the beam pipe material into a scintillator, as I have done in branch debug-2, in BDSSpectrVacChamb. If you run the awake example with optical physics list enabled in this branch you will see an exception occurs at the end of the first event:

BDSIM is about to crash or was interrupted! 
With signal: Segmentation fault: 11
Terminate run - trying to write and close output file
Ave, Imperator, morituri te salutant!
WARNING - Attempt to delete the physical volume store while geometry closed !
WARNING - Attempt to delete the logical volume store while geometry closed !
WARNING - Attempt to delete the solid store while geometry closed !
WARNING - Attempt to delete the region store while geometry closed !

Process finished with exit code 1

This happens after the EndOfEventAction in the destructor of G4Trajectory (via the destructor of BDSTrajectory) but I do not understand why it is happening. It only happens when the “optical” physics list is enabled (I am using “optical” and “em”).

Comments (16)

Jochem Snuverink
I'll have a quick look at this as I have an idea what it could be.
- 2017-03-07T15:36:19+00:00
Jochem Snuverink
- edited description
nicer formatting
- 2017-03-07T15:37:44+00:00
Laurie Nevay
So one problem is in end of event action, there isn't a trajectory container when the event is aborted - I can add a check for this. I'm pretty sure this is what causes the segfault.

The event is aborted because an optical process is called not on a boundary some how. This links to the other issue we saw with Geant4.10.3 which also showed the same error with the optical physics. When I check on the validity of the trajCont pointer it runs ok and it completes without segfaulting but with errors.
- 2017-03-07T16:29:17+00:00

Jochem Snuverink

Agreed with Laurie that a check on the trajectory container is needed.

When I run in debug I see a huge number of photons. The last bit of printout I get is:

#!

BDSTrackingAction::PreUserTrackingAction>  TrackID=168812 ParentID=168703
BDSTrackingAction::PreUserTrackingAction>  TrackID=168856 ParentID=168705
BDSTrackingAction::PreUserTrackingAction>  TrackID=168855 ParentID=168732
BDSTrackingAction::PreUserTrackingAction>  TrackID=168854 ParentID=168733
BDSTrackingAction::PreUserTrackingAction>  TrackID=168851 ParentID=168836
StackingAction: New stage
BDSEventAction::EndOfEventAction> processing end of event action
BDSEventAction::WritePrimaryVertex> 
BDSOutputROOTEvent::WritePrimary> 
BDSEventAction::EndOfEventAction> processing sampler hits collection
BDSOutputROOTEvent::WriteHits> 
BDSOutputROOTEvent::WriteHits> 383
BDSEventAction::EndOfEventAction> processing cylinder hits collection
BDSEventAction::EndOfEventAction> filling histograms & writing energy loss hits
BDSOutputROOTEvent::WriteEnergyLoss> 
BDSOutputROOTEvent::WriteTunnelHits> 
Interaction point found at 13.387352000066 m - fElectromagnetic : fIonisation
Interaction point found at 13.387352000066 m - fElectromagnetic : fIonisation
BDSOutputROOTEvent::WritePrimaryHit> 
BDSOutputROOTEvent::WritePrimaryLoss> 
BDSEventAction::EndOfEventAction> finished writing energy loss
BDSOutputROOTEvent::FillEvent> 
BDSEventAction::EndOfEventAction> end of event action done

Not sure if 150k tracks will be handled correctly memory-wise (it should be though!)

2017-03-07T16:42:52+00:00

Laurie Nevay
I've added the check in Lawrence's 'debug-2' branch.

I then see the invalid exit normal warning / optical boundary process in middle of volume and the event is aborted.
- 2017-03-07T17:11:30+00:00
Laurie Nevay
Stewart has tested the trajectory output with 50TeV hadronic showers with ~500k tracks without problem. I'm not sure of the memory usage though.
- 2017-03-07T17:14:40+00:00
Laurie Nevay
So, I can avoid the optical process boundary error by disabling the sampler parallel world physics process (See screenshot edit of bdsim.cc at line 161.)

The first event runs for around 5 minutes on my laptop and then crashes with a bad access, again in the trajectory destructor.
- 2017-03-07T17:18:59+00:00
Laurie Nevay
There are no samplers constructed in the parallel sampler world though so this shouldn't affect it. We always register the sampler parallel world and process even if there aren't any samplers (hard to know at that stage whether there will be any). But I think this is related to the other error so I'll leave that for now.
- 2017-03-07T17:41:54+00:00
Laurie Nevay
I think I may have solved this now. We override the method MergeTrajectory in BDSTrajectory but simply call the base class method. This copies points from one trajectory to another (via allocation) and deletes old ones. The base class (G4Trajectory) deletes the points in its container by their base class pointer (G4TrajectoryPoint) in their own container.

Replacing the call the base class MergeTrajectory and implementing it in our class as per the geant4 example
```
 extended/optical/wls/src/WLSTrajectory.cc
```
seems to fix this problem.

This only seems to happen with optical physics.

The awake spectrometer example runs for an extremely long time per event (upwards of 10 minutes). For (mostly) testing purposes, I introduced two options in the parser to shorten this.
```
option, maximumPhotonsPerStep = 1;
option, maximumTracksPerEvent = 1e3;
```
The first is a simple interface to the function of the same name in G4OpticalPhysics that controls the maximum number of Cherenkov photons that can be produced per step. This doesn't have too much affect on the duration. The second is a relatively dangerous artificial cut on the number of tracks. When the track ID exceeds this number, all tracks are killed irrespective of any other cuts. This definitely does not conserve energy. Useful for testing.

The parser/published class was extended to handle the long type.

Note G4Track::GetTrackID() returns a G4int which is a typedef to int. If unsigned, we would expect the maximum number to be 65535. However, this can easily exceed 1 million. In this awake example, a single event can produce tens of millions of tracks without limitation. These numbers and the type of int seem at odds. I used long anyway as it's what should be used. The default is 0, which takes no action at all.

Another change was to allocate the trajectory points vector on to the heap. Although the contents of the vector are on the heap the vector object itself is on the stack. Since we're allocating millions of even the smallest object, we should use the heap. This also follows the way the Geant4 examples do it.

These changes are in the debug-2 branch. Please have a go

I'd like to merge this branch back into develop, but I'll let you (Lawrence) decide which changes for the screens and awake spectrometer you'd like to keep.

Note, since I've changed the option, you should do make install if using the libs in ROOT to load the data.
- 2017-03-07T23:41:10+00:00
Laurie Nevay
Limiting the maximum number of tracks to 1e3 as above limits the event duration to about 10s. Even with this limit, the fault was evident before with optical physics enabled. It happened with only a very low number of tracks / trajectories. These settings are of course not suitable for studies.
- 2017-03-07T23:42:44+00:00
Laurie Nevay
I hope this may also solve Issue ~~#183~~ - I'll test tomorrow.
- 2017-03-07T23:43:24+00:00
Jochem Snuverink
great that it is solved. Just as a side note, although an unsigned int is indeed guaranteed maximally 65535 (2^16 -1). However on practically all 32 and 64 bit machines the maximum is 2^32 -1 = about 4 billion (https://en.wikipedia.org/wiki/64-bit_computing#64-bit_data_models). This can be checked by checking the value of UINT_MAX. So although we should theoretically be careful, in practice there is absolutely no problem there and we will easily run out of memory before we hit 4 billion tracks.
- 2017-03-08T07:34:42+00:00
Jochem Snuverink
Instead of merging into develop, which I presume was not the intention of the commits by Lawrence, we could also cherry-pick the commits by Laurie into develop?
- 2017-03-08T07:44:03+00:00
Laurie Nevay
I think it's probably easier and cleaner to undo those particular edits in the debug-2 branch and merge it into develop now. We can easily undo them in source tree by viewing the commit and clicking 'reverse hunk' in the viewer.

But as you please - either way.
- 2017-03-08T11:33:09+00:00
Jochem Snuverink
Up to you. I would say the opposite that cherry-picking is easier and cleaner (I count only 6 commits), but it depends if one keeps the commits of @ldeacon.

By the way I thought a bit about the addition of long to the parser. It should work in the current implementation (since GMAD works in doubles), but I don't see the advantage in this case since, as you noted, GetTrackID() returns an int anyway. But we can keep it.
- 2017-03-08T12:56:37+00:00
Laurie Nevay
- changed status to resolved
Fixed with implementation of BDSTrajectory::MergeTrajectory.
- 2017-03-10T16:24:28+00:00
Log in to comment

Assignee: –

Type: bug

Priority: major

Status: resolved

Votes: 0

Watchers: 1