Regarding PhoSim Output Differences With and Without CONDOR

Issue #3 resolved
Joseph Glaser created an issue

Hello,

I have a few quick inquiries regarding the formatting and naming of the output files for PhoSim, though it may boil down to the nature of HTCondor's implementation in PhoSim.

The output file names have the format:

<Instrument>_<a/e>_<Observation ID>_<Filter ID>_<Sensor ID>_<Channel ID>_<Exposure ID>.fits.gz

Running an approximate 12 arcminute aperture CatSim catalog through PhoSim without HTCondor resulted in files with Sensor IDs of R22_S## with a run time of approximately 10 hours on a single, ~1.5 GHz CPU core. Note, S## ranged from S00 to S22 (as expected for the 3x3 CCD array coverage)

However, when PhoSim is run on HTCondor over 32 identical CPUs with that same catalog, the resulting files have Sensor IDs R##_S##, where R## ranges from R00 to R43. This run took 17 hours 21 minutes and 31 seconds. Most of these files are empty .fits with the exception of the R22_S## files.

What exactly are all of these extra files, or possibly more central, how does the Sensor ID relate to the sensor's physical position in the array?

Thanks very much in advance for your help in this.

~ Joe Glaser

Comments (5)

  1. John Peterson

    So Joe, for LSST the notation works by organizing the sensors into 3x3 “rafts” so Sxy means the xy position in the raft. similarly there are several rafts in a pattern that roughly fills a circle where Rxy refers to the position in the overal raft. I’m confused though why running it on condor would produce the files whereas on your laptop they wouldn’t. So we will look into that.

    also, usually the simulation time on condor has less to do with CPU time and more to do about when your jobs get launched.

    john

  2. En-Hsin Peng

    It has to do with minsource/minNumSources setting. The default setting is to simulate the full focalplane with Condor whether or not there is an astronomical source on it. If you don't want to simulate the full focalplane, you can use -s flag.

    e.g.,

    ./phosim instance_catalog -g condor -s 'R22_S00|R22_S01|R22_S02|R22_S10|R22_S11|R22_S12|R22_S20|R22_S21|R22_S22'

    This will generate the only central 9 chips with Condor.

  3. Joseph Glaser reporter

    Thanks for the aid in figuring this out, guys. Everything is running much more smoothly now. I also figured out that the -p NUMOFCORES tag can be added to the command for Phosim when HTCondor is not installed to do local parallelization. So now I am running two different installs: one with -p added for small FOV sims; and one with -g added for full FOV sims. This seems to optimize the run time the most effectively.

    On a related note, John, you mentioned that Condor's lag is likely due to delays in jobs getting launched. Do you know of a good source I can look at to figure out how to switch the settings of Condor over to start the submitted jobs quicker?

    Also, would either one of you know of a way to have the Xeon Phi GPUs I have installed on the cluster to be recognized by Condor as open slots?

    Much thanks in advance for all the help!

    ~ Joe

  4. John Peterson

    Joe-

    Well the one thing you can try from phosim to improve its interaction with CONDOR is to try out the -u universe flag. There are two options in condor: standard universe or vanilla universe. Standard universe would be a small set of cores that can see some disk you are on and is likely to have some advanced options like checkpointing. vanilla universe will be more cores but will have less sophisticated uses. it is completely unclear whether one or the other is better, and it depends how many systems your local experts have connected together. often we submit jobs with a mixture of both, but its magic a little bit of what works best.

    i don’t know about the Xeon Phi GPUs. you’d have to talk to your local experts to see how they’ve configure condor for you, i think.

    hope this helps,

    john

  5. Log in to comment