race condition when building Cactus utilities

Create issue
Issue #1612 closed
Roland Haas created an issue

I just compiled on bluewaters and received output

Done creating cactus_simO3.
All done !
Building utilities for simO3
Building utilities for simO3
...
Creating hdf5_recombiner in /mnt/a/u/sciteam/rhaas/ET_trunk/exe/simO3 from /mnt/a/u/sciteam/rhaas/ET_trunk/configs/simO3/build/CarpetIOHDF5/hdf5_recombiner.o
mkdir: cannot create directory `/tmp/1399318629': File exists
mkdir: cannot create directory `/tmp/1399318629': File exists

with the full (last part of) the log in the attached file.

So it seems as if we have a race condition in the make system that causes it to try and build the utilities twice (in parallel).

On bluewaters simfactory (which I used) builds Cactus using 16 processes.

Keyword:

Comments (8)

  1. Frank Löffler
    • removed comment

    I could not find a reference to a mkdir in the util-building of any thorn with a quick search. Does this happen consistently, and only for the utils? I think that with just this logfile it would be hard to find the problem. I also often build in parallel, and didn't see this yet. Could this be caused by something on the BW side?

  2. Roland Haas reporter

    Some digging reveals this mkdir to be actually done by bluewater's ld command which is a shell script (/sw/xe/altd/bin/ld) "ALTD ld wrapper script".

    It creates in line 109

    WRKDIR=/tmp/`date +%s`
    mkdir $WRKDIR
    

    a temporary directory whose name is solely based on the current time (which is quite silly, it would have been very easy to also include say the process id or simply use mktemp and avoid this race condition).

    There seems to be not much we can do about this. I was actually more concerned about the fact that "Building utilities for simO3" appears twice, however this seems to be output by simfactory.

  3. Frank Löffler
    • removed comment

    Wow, this is quite a lapse by the BW admins. Could you please open a ticket with them?

  4. Roland Haas reporter
    • removed comment

    I submitted a ticket. Still not sure why simfactory prints "Building utilities for simO3" twice.

  5. Roland Haas reporter
    • changed status to resolved
    • removed comment

    I believe this is fixed. Bluewaters support replied to my ticket (last May but apparently I myself never replied and am not aware of having received any emails). The line in question now reads:

    #WRKDIR=/tmp/`date +%s` Change per BWSS-601 tbouvet 05/08/14
    WRKDIR=/tmp/`date +%s.%N`
    mkdir $WRKDIR
    

    which appends the number of nanoseconds. The probably makes it much less likely to happen again (though I'd still have preferred if $$ was added for the PID as well in case the nanosecond time is not really updated every single nanoseconds but eg only every 1/50 s or so).

  6. Log in to comment