New machine thornyflat at WVU
Pull request is here https://bitbucket.org/simfactory/simfactory2/pull-requests/50/new-machine-thornyflat-at-wvu .
Comments (21)
-
-
I mistook the trampoline for an actual cluster. If the machine works with this setup I see no reason not to include it, provided that someone at WVU is ok to test it for the releases and provides updates.
-
-
assigned issue to
-
assigned issue to
-
- changed status to open
-
Eric, I am picking up this ticket to let you know that it is not sorted out yet. I tried to run GW150914.par on 4 nodes (160 cores) and nsnstohmns.par on 1 node (40 cores), both within a PBS batch script and directly from the terminal. My jobs were either cancelled “for the following reason: no feasible locations found to run job”, or I cancelled them after more of a week of sitting in the queue. My priority, as external user, must be rather low.
Here is the output when starting from the terminal:
- for GW150914
[mbh0012@trcis001 Cactus]$ simfactory/bin/sim create-submit GW150914_128 --configuration=bns --machine=thornyflat --parfile=par/GW150914.par --cores=128
Warning: Current Working directory does not match Cactus sourcetree, changing to /users/mbh0012/Cactus
Parameter file: /gpfs20/users/mbh0012/Cactus/par/GW150914.par
Skeleton Created
Job directory: "/scratch/mbh0012/simulations/GW150914_128"
Executable: "/users/mbh0012/Cactus/exe/cactus_bns"
Option list: "/scratch/mbh0012/simulations/GW150914_128/SIMFACTORY/cfg/OptionList"
Submit script: "/scratch/mbh0012/simulations/GW150914_128/SIMFACTORY/run/SubmitScript"
Run script: "/scratch/mbh0012/simulations/GW150914_128/SIMFACTORY/run/RunScript"
Parameter file: "/scratch/mbh0012/simulations/GW150914_128/SIMFACTORY/par/GW150914.par"
Assigned restart id: 0
Warning: Total number of threads and number of threads per process are inconsistent: procs=128, num-threads=20 (procs*num-smt must be an integer multiple of num-threads)
Warning: Total number of threads and number of cores per node are inconsistent: procs=128, ppn-used=40 (procs must be an integer multiple of ppn-used)
Executing submit command: qsub /scratch/mbh0012/simulations/GW150914_128/output-0000/SIMFACTORY/SubmitScript
Submit finished, job id is 468083This is how it sits in the queue:
468083.trcis002.hpc.wv mbh0012 comm_sma GW150914_128-00 -- 4 160 -- 168:00:00 Q --
Note, that I asked for 128 cores, and what it received is `procs=128, num-threads=20`. I thought I shoudl run with `OPM_NUM_THREADS=1` Is there a way to hard code this in the `/scratch/mbh0012/simulations/GW150914_128/output-0000/SIMFACTORY/SubmitScript`?
2. for nsnstohmns
[mbh0012@trcis001 Cactus]$ simfactory/bin/sim create-submit nsnstohmns --configuration=bns --machine=thornyflat --parfile=par/nsnstohmns.par --cores=40
Warning: Current Working directory does not match Cactus sourcetree, changing to /users/mbh0012/Cactus
Parameter file: /gpfs20/users/mbh0012/Cactus/par/nsnstohmns.par
Skeleton Created
Job directory: "/scratch/mbh0012/simulations/nsnstohmns"
Executable: "/users/mbh0012/Cactus/exe/cactus_bns"
Option list: "/scratch/mbh0012/simulations/nsnstohmns/SIMFACTORY/cfg/OptionList"
Submit script: "/scratch/mbh0012/simulations/nsnstohmns/SIMFACTORY/run/SubmitScript"
Run script: "/scratch/mbh0012/simulations/nsnstohmns/SIMFACTORY/run/RunScript"
Parameter file: "/scratch/mbh0012/simulations/nsnstohmns/SIMFACTORY/par/nsnstohmns.par"
Assigned restart id: 0
Executing submit command: qsub /scratch/mbh0012/simulations/nsnstohmns/output-0000/SIMFACTORY/SubmitScript
Submit finished, job id is 468082468082.trcis002.hpc.wv mbh0012 comm_sma nsnstohmns-0000 -- 1 40 -- 168:00:00 Q --
-
reporter The first job you submitted requested 128 cores, which is not a multiple of 40 cores, the number of cores per node. The queuing system does not seem to support this. I recommend using 160 cores instead of 128 cores.
Your second job seems to be waiting the queue just fine. If you want it to start earlier, then you could try asking for a shorter run time. Usually, jobs asking for a longer time take a longer time to start. Start by asking for one hour to see whether it works, then maybe for 24 hours. You could also inquire with the system administrators whether the job’s parameter are fine.
-erik
-
Any progress in fixing up / testing the machine? It’s not a bit hurry but if you want it to be included in the list of officially supported machines for the toolkit the testsuite with the release candidate must run at least once (mostly successfully) and the files must be in master.
-
Oh, yes! All runs well. Please close this ticket.
-
reporter Maria
Are these Simfactory configuration files working for you? I’ve lost access to Thornyflats in the mean time. If I need to help you debug this, you would need to apply for a new account for me.
-erik
-
Pull request is here: https://bitbucket.org/simfactory/simfactory2/pull-requests/50
-
@Zach Etienne does thornyflat still exist? If so is the pull request still a good simfactory entry for it?
-
@Roland Haas : Yes, it does. However, I don’t use simfactory so I wouldn’t know if the PR is good. If it works for Maria and doesn’t cause any harm to simfactory etc, I would vote for inclusion.
-
@Maria , can you verify that the files in the pull request work for you and will you volunteer to keep them up to date in the future and (ideally) test them for each ET release (or designate someone who will do so)?
-
Roland,
Yes, of course! I'll have this sorted out. I also proposed to test Bridges II.
Maria
-
Unless there are objections I will apply the pull request after 2022-02-03.
-
OK. I'll test it out this weekend with the Johnson release and let you know how it goes.
-
Roland and Erik,
I tested the thornyflat settings with Johnson release.
- To pull, I used: cd repos/simfactory2/ git fetch && git checkout origin/eschnett/thornyflat
The result was: ls simfactory/mdb//thornyflat simfactory/mdb/machines/thornyflat.ini simfactory/mdb/runscripts/thornyflat.run simfactory/mdb/optionlists/thornyflat.cfg simfactory/mdb/submitscripts/thornyflat.sub
- To setup, I used the command: ./simfactory/bin/sim setup --optionlist=simfactory/mdb/optionlists/thornyflat.cfg --runscript simfactory/mdb/runscripts/thornyflat.run
The result was:
-
lang/gcc/11.2.0 parallel/openmpi/4.1.2_gcc112 libs/fftw/3.3.10_gcc112_ompi412 libs/hdf5/1.12.1_gcc112_ompi412 libs/openblas/0.3.19_gcc112
-
@Maria I added you as a “developer” to the Simfactory repo so you should be able to push your required changes your self into the
eschnett/thornyflat
branch.You may have to do a full checkout first ie:
git clone git@bitbucket.org:simfactory/simfactory2.git git checkout eschnett/thornyflat
then add the modified files and tell git about them
git add mdb/machines/thornyflat.ini git add mdb/runscripts/thornyflat.run git commit -m 'thornyflat: update after modules have changed' git push
where you will need one
git add
per changed file (or usegit add -p
which will interactively show each change to add to the commit). -
Dear Eric and Roland,
Sorry, I cannot commit the Simfactory scripts for ThornyFlat, because there seems to still be a problem. The production run gave me segmentation fault. I am pasting below the error's gobbledygook, please help me tease out and fix the problem:
-
- changed status to duplicate
Duplicate of
#2598. - Log in to comment
This is work in progress it seems.