At the moment, http://simfactory.org/info/documentation/userguide/processterminology.html says that if you ask for 6 threads for a job (--procs 6), simfactory will assume that you want to use the whole node and will instead round this up to the number of cores per node, which on LoneStar is 12. If you want to use only 6 threads, you need to use --ppn-used 6. I find this confusing, and would prefer to be able to get 6 threads if I ask for 6 threads, even if underneath I have to claim 12 from the queuing system. This is the way it seems to work on Datura: if I ask for --procs 6, I get 6 threads (i.e. one MPI process). This seems to be in conflict with the documentation, but I prefer this behaviour.
Proposal: if the number of threads requested (--procs) is less than the number of cores per node (PPN), then don't round up the number of threads to the nearest larger multiple of PPN. Thoughts?