Don't overload machines while building

Issue #536 wontfix
Erik Schnetter created an issue

Certain machines are easily overloaded by building Cactus; building multiple configurations at the same time is not possible there. For example, LONI does not have sufficient compiler licences, or Orca (Sharcnet) does not allow sufficiently many user processes. On LONI, things will be very slow (also for other users), on Orca building will fail with strange error messages.

I suggest a mechanism that automatically serialises building Cactus configuration on certain sets of machines. Ideally, one would extend the "make -j" mechanism for this; practically, an implementation via locks may be simpler. Note that this has to work for sets of machines, not just single machines.


Comments (6)

  1. Ian Hinder
    • removed comment

    GNU make has a -l option to specify a load threshold above which further tasks will not be spawned. In principle this would solve the problem. In practice I have observed it to severely underutilise the machine, with the load on the machine never approaching what I set it to.

    From man make:

    -l [load], --load-average[=load] Specifies that no new jobs (commands) should be started if there are others jobs running and the load average is at least load (a floating-point number). With no argument, removes a previous load limit.

  2. Erik Schnetter reporter
    • removed comment

    This works only on a single system; there is no way to ensure that builds on several separate systems are serialised. It also seems to take the whole system load into account, not only the processes started by one user.

  3. Ian Hinder
    • removed comment

    Why would you want to ensure that builds on several separate systems are serialised?

    Why would you want to limit the number of processes started by one user if the whole system can accommodate more?

  4. Erik Schnetter reporter
    • removed comment

    LONI provides a certain number of compiler licences shared across several systems. Building on several systems simultaneously slows down all builds, and also slows down builds of other users.

    Orca does not allow too many processes per user. If one user tries to have more processes, things will fail with strange symptoms.

  5. Roland Haas

    As of git hash 6576596 "wheeler: use makejobs" of simfactory2 most (all?) machines in simfactory support a -j switch on the level of simfactory making it easy for users to choose how many parallel make jobs to use.

    It does not quite automatically pick the correct value but makes it straightforward to control this. Since -l seems to not really work and implementing locks will only allow per user limits (and not take into account activity of other users) and is typically fragile, I will close this as wontfix.

  6. Log in to comment