Remote execution refactoring

Issue #9 closed
Kristian Ovaska created an issue

Here is a proposed simplified command line interface for anduril.jar remote execution support. The only flag usually needed is --remote, which specifies the prefix script and works like before. If there is no --remote, local mode is used. Annotation are passed using ANDURIL_REMOTE_ instead of ANDURIL_PREFIX_ for simpler terminology.

For cases when the mount points can differ between nodes, --remote-path-mapper can do path mapping.

For general pre/post-processing support (such as without shared file system and need for rsync), --remote-pre and --remote-post are provided.

All of these are shell scripts.

Look good?

anduril.jar run-workflow
  --remote REMOTE-SCRIPT.sh
  [--remote-path-mapper PATH-MAPPER.sh]
  [--remote-pre PREPROCESS.sh]
  [--remote-post POSTPROCESS.sh]

PATH-MAPPER.sh HOST < local paths one per line > remote paths one per line
"remote" is the default HOST if _host annotation is not used

PREPROCESS.sh COMMAND-FILE.properties

REMOTE-SCRIPT.sh LAUNCHER-ARGS
defined: $ANDURIL_REMOTE_{CPU,HOST,MEMORY,USERDEFINED}

POSTPROCESS.sh COMMAND-FILE.properties

Comments (11)

  1. Kristian Ovaska reporter

    Thinking a bit more, I replace the above with just run-workflow --wrapper WRAPPER.sh. Path mapping can be done using symlinks by those who need it, and pre/post processing can be done using //$PRE and //$POST.

    --wrapper is a better term than --remote because some users might also do wrapping locally (e.g., Docker).

  2. Kristian Ovaska reporter

    I committed an initial version that has --wrapper. Engine test cases pass, but this was a major refactoring so I appreciate testing in practice.

    The environment variables are $ANDURIL_WRAPPER_{CPU,HOST,MEMORY,USERDEFINED}.

    In near future, I plan to replace the annotation _cpu, _host, _memory and _userdefined with _extra, which is a map from strings to strings or ints. Then you can have mycomponent._extra("docker") = "anduril/image" and it will appear as $ANDURIL_WRAPPER_DOCKER. You can also have _extra("memory") = "5G", so Anduril doesn't enforce it to be an integer.

    _userdefined becomes unnecessary, but you can emulate it using _extra("userdefined") if needed.

    Do you prefer to have _host or some other existing annotation as shortcut still, or is _extra OK?

  3. Kristian Ovaska reporter

    I committed the _extra annotation. _cpu, _host, _memory and _userDefined are still available, but they are marked for deprecation. Suggested usage is: _extra("host") = "somehost".

  4. Ville Rantanen

    why not just ._annotation if all the annotations are to be in a structure like that? _extra doesn't sound like a planned feature.

    or maybe ._resource(), because all of these are intended for resource management

  5. Kristian Ovaska reporter

    There are some annotations that are only used by the Anduril engine, like _priority and _execute. I want to separate these from user-defined annotations.

    _extra can be anything. They are passed to wrapper scripts and also components. They can be resource management, but in principle something else as well.

    Maybe _custom is a better name?

  6. Ville Rantanen

    sounds better. are these still written in the _command file? that's a great way to convey them to wrappers and components.

  7. Ville Rantanen

    i suppose _custom can hold any entry, and it gets written in the _command file? that way you dont actually need userDefined anymore, since you could write just anything.

    ex- _cpu could be upgraded a bit, in reference to an older ticket.. _cpu should reduce also --threads count.. meaning, if you run locally, _cpu=4 means with --threads 4 only that one component can run, and no others in parallel.

  8. Kristian Ovaska reporter

    Yes, _custom can have anything, and it is written to the command file, and exposed to wrapper scripts.

    The interaction of _cpu and --threads has never been present in the core, and would need separate testing to make sure no concurrency bugs are introduced. Also you need to think about cases like --threads = 4 and _cpu = 5: give error or set _cpu = 4? I hesitate implementing this in the core, because it can get complicated and you can do it in wrapper scripts anyhow. Even if core supported CPU resource management, you would often still need memory resource management.

  9. Log in to comment