missing static libraries

Issue #10 closed
Thomas Glanzman created an issue

In an attempt to build a version of phoSim against the condor libraries, the link step is failing to find various static libraries, but for different reasons.

  1. libz.a is simply not available on my standard rhel6-64 machine. Installing libz and adjusting the Makefile allowed 'raytrace' to link.

  2. libm.a is not on the library paths searched by the makefile. I did find the library in this location: /usr/lib/x86_64-redhat-linux5E/lib64/libm.a This library is needed by 'trim'

That's as far as I've gotten at present but am wondering where this will stop and how others who are not on machines configured exactly as the developers' are getting around this problem? Or am I simply missing some mechanism already in place to allow for this (remember imake?) ?

  • Tom

Comments (13)

  1. Thomas Glanzman reporter

    Yes, I did and here is the resultant 'setup' script:

    setenv CCP "condor_compile g++"

    setenv CCS g++

    setenv CFIODIR /nfs/farm/g/lsst/u1/software/redhat6-x86_64-64bit-gcc44/externals/cfitsio/3.370/lib/

    setenv CFIO_INC_DIR /nfs/farm/g/lsst/u1/software/redhat6-x86_64-64bit-gcc44/externals/cfitsio/3.370/include/

    setenv FFTW3_DIR /nfs/farm/g/lsst/u1/software/redhat6-x86_64-64bit-gcc44/externals/fftw/fftw-3.3.4/lib/

    setenv FFTW3_INC_DIR /nfs/farm/g/lsst/u1/software/redhat6-x86_64-64bit-gcc44/externals/fftw/fftw-3.3.4/include/

    A couple of additional comments.

    1. This is my first attempt to build phoSim using condor libraries (which I downloaded and installed, version 8.5.3). It could be there is something missing, although the condor libraries all seem to be present and configured. Is there some other magic needed associated with making condor play nicely with phoSim?

    2. A huge change is the construction of a static executable. The default phoSim uses .so libraries, while the condor build needs the .a libraries. The lack of libz.a and location of libm.a may be a puzzle. I had to download and install zlib (v1.2.8) in order to get its static library. I am using what purports to be a 'standard' Redhat 6 64-bit installation. Are there RPMs needed to support static builds?

    3. While attempting to link 'instrument' and 'atmosphere', I am also seeing messages about undefined references, including __libc_csu_init, __libc_csu_fini, __isoc99_sscanf. I've not fully tracked down the cause but early clues point to the use of "ld" rather than "g++" for the link step.

    • Tom
  2. En-Hsin Peng

    We did encounter missing libz.a problem on Purdue community clusters. When using "condor_compile g++", libraries have to be linked statically (as far as I understand). We simply installed libz ourselves and added the path to LIBRARY_PATH.

    e.g.,

    setenv LIBRARY_PATH /depot/lsst/apps/zlib-1.2.8

    (put it in bin/setup)

    But you don't need to use condor_compile to run condor jobs. Codes can be compiled with g++ and run in vanilla universe.

    We also encountered compiling problems on another system (diagrid.org), and ended up using both shared and static libraries because some issues with libstdc++ or libgcc. Here is the linker flags that finally works.

    LFLAGS = -g -O3 -ffast-math -Wl,-Bstatic -lcfitsio -Wl,-Bdynamic -lfftw3 -lm -lz -lpthread

    pthread is needed for cfitsio in this case.

  3. Thomas Glanzman reporter

    Thanks En-Hsin for this tip! I needed to add a "-L" for the path to libz.a after the -Bstatic, but now everything seems to build.

    My intent with this condor build is solely to exploit the checkpointing mechanism -- but without using a condor batch farm. That is, to supply the needed signals manually to activate checkpoint and halt. My understanding is that phoSim must be built with condor libraries to enable this feature and, indeed, a short example run does indicate that checkpointing has been enabled. If you have any advice for me in achieving this goal, please pass it along.

    Thanks again, - Tom

  4. En-Hsin Peng

    There are two types of checkpointing as I know. One is automatically provided by condor when codes are compiled with "condor_compile" and run in Standard universe. But it doesn't guarantee that the evicted job will be saved and restarted from where it left off. The other checkpointing mechanism is in raytrace, in which we split the simulation into multiple steps and save the output of the current step for the next step. This doesn't need condor libraries. If you are interest in the second one, you'll need to modify phosim.py script to configure raytrace. Currently the script only provides the checkpointing option (second one) for condor build.

  5. Thomas Glanzman reporter

    Hi En-Hsin,

    Thank you for this information. Can you advise me how to activate and use the checkpointing within raytrace? Would the phosim script also need changes to know when a restart (rather than a fresh start) was being initiated?

    • Tom
  6. En-Hsin Peng

    You will need to add two lines to the raytrace input file (raytrace_xxxx.pars in work directory)

    checkpointcount i

    checkpointtotal n

    n is the number of checkpoints you want. i = 0,1,2, ..., n. This will split to n+1 parts. Then run those jobs in sequence.

    bin/raytrace < raytrace_xxx_0.pars

    bin/raytrace < raytrace_xxx_1.pars

    .

    bin/raytrace < raytrace_xxx_n.pars

    This functionality doesn't exist in phosim.py. You'll probably need to modify function 'jobChip'.

    For condor jobs, if the output of Step i exists (lsst_e_xxxx_ckptdt_i.fits.gz, lsst_e_xxxx_ckptfp_i.fits.gz), it will restart from Step i+1 (see details in 'writeRaytraceDag' in condor/condor.py).

  7. John Peterson

    but no, don't do what en-hsin just said, because the condor would already checkpoint without doing that. it would already checkpoint automatically.

  8. John Peterson

    also, en-hsin, lets clean this up and we'll release a new patch. i didn't realize that there were any by hand steps for the condor compilation. i thought that was fully fixed, so tom shouldn't have to worry about this.

  9. Thomas Glanzman reporter

    @johnrpeterson I interpreted @enhsin to say these two checkpointing mechanisms were independent and that the raytrace checkpoints he described could be used without condor. Is that correct? If so, then perhaps this mechanism is the one I should focus on...

    Also, I had been investigating using the condor-based checkpointing but without the support of a running condor batch system, i.e., I would arrange to generate the necessary signals externally to trigger checkpoint and halt responses. That would, of course, require some changes to the main driver script to properly handle a new condition: checkpoing&halt. The script would also need to learn how to recognize a previously checkpointed job and continue properly. Does this approach seem reasonable?

  10. John Peterson

    this issue i think is old, and we discussed this over the summer. i think this should be closed now.

  11. Log in to comment