- edited description
upcxx-run (upcxx_srun) can't be used when specialized cores are requested on Cori (-S option)
upcxx-run fails in that sepcific case. The issue is that the upcxx_srun script which is called by the launcher is not taking the number of specialized cores (salloc -S C -N 1) requested by the user. In this case, the number of processes should be 68-N on Cori KNL. Upcxx_srun rather uses a hardcoded value of 68, which leads to always requesting more than what's available.
Here is a proposed solution:
@@ -35,7 +35,7 @@
case "$cpu" in
ivb|ivybridge) cores=24; thr=2;;
hsw|haswell) cores=32; thr=2;;
- knl|mic-knl) cores=68; thr=4;;
+ knl|mic-knl) thr=4; cores=$((($SLURM_CPUS_ON_NODE)/$thr));;
*) echo "ERROR: Unknown cpu type '$cpu'" >&2; exit 1;;
esac
##
Comments (6)
-
reporter -
@mjacquelin Is there any reason the same logic should not apply to Edison (ivb) and Cori-I (hsw)?
I think the desired change is the following. Do you agree?
--- upcxx_srun~ 2018-10-30 11:25:37.216659000 -0700 +++ upcxx_srun 2018-10-30 11:27:33.326565000 -0700 @@ -33,11 +33,12 @@ fi ## case "$cpu" in - ivb|ivybridge) cores=24; thr=2;; - hsw|haswell) cores=32; thr=2;; - knl|mic-knl) cores=68; thr=4;; + ivb|ivybridge) thr=2;; + hsw|haswell) thr=2;; + knl|mic-knl) thr=4;; *) echo "ERROR: Unknown cpu type '$cpu'" >&2; exit 1;; esac +cores=$(($SLURM_CPUS_ON_NODE/$thr)) ## if test -z "$nnode" || test -z "$nproc"; then echo "ERROR: Unable to determine job geometry." >&2
-
reporter You are correct, there is no reason it should be any different. Thanks.
-
I have made the change I proposed above to the "live" script.
I have done to basic testing, but am not sure I cover all the cases.Please test ASAP to let me know if anything you would normally do is now broken.
I doubt it is the case, but I should revert soon if anything did break. -
reporter Seems to do the job on Cori KNL. Thanks
-
- changed status to resolved
- Log in to comment