Encapsulate all end-user commands into single executable 'anduril'

Issue #7 closed
Kristian Ovaska created an issue

There should be one executable (anduril) that contains most or all functionality needed by end users. This would be implemented using Bash. Below is a sketch of the syntax. It probably misses some stuff. Comments?

anduril
Print help

anduril run workflow.scala -s helper.scala
anduril run workflow.jar
Run workflow. Should support most args that anduril.jar supports?

anduril compile [-f] [-u]
Corresponds to anduril-recompile's -a -f -u flags

anduril install-bundle
anduril install-reqs [--sudo] (-s = sudo)

anduril test-accept
anduril test-component (like anduril.jar test)
anduril test-workflow (like anduril.jar test-networks)

anduril build-doc -d DEST BUNDLE1 [BUNDLE2...]
Much like current Java version but with more convenient syntax

Comments (21)

  1. Ville Rantanen

    the current anduril script actually already has this command detector.. three different commands are parsed and arguments passed to separate scripts. just run "anduril" and look at the end for ## External tools

    this structure could be further developed, like you sketch...

    Also, anduril-runner could be included with some keyword, like "anduril executable"

    Actually parsing the switches and arguments for the commands is a pain in the @$$ in bash, i'd hope to just detect the first argument, and then throw the remaining args to the script in question.

  2. Kristian Ovaska reporter

    For long-term maintainability, at this point Anduril might follow the TeX/LaTeX model, in which core functions (TeX / Anduril engine) are more or less fixed, and higher levels (LaTeX / user-friendly Anduril-invoking scripts) can evolve. Git has a two-layer model divided between low-level "plumbing" and high-level "porcelain" functions. This would also be useful for Anduril.

    In this model, anduril.jar would contain only low-level functions and is invoked from the more user-friendly Bash/Python scripts. End users would never directly call anduril.jar.

    The new "anduril" command could be implemented in Python, since it has convenient command line parsing and anduril-recompile already depends on Python. Of course the Python script can invoke Bash scripts.

    Labor division would be me for implementing/modifying any missing low-level functionality to anduril.jar that would be useful for high-level scripts, and the lab for maintaining the high-level scripts. For example, anduril.jar could have a command to return the set of components or component instances that are used in a workflow, without running the workflow.

    build-doc could be moved out of the core, if anduril.jar provided a command to return a parsed component repository (including docs, author indexes, etc) in JSON format. This way also it could evolve, and even use an alternative or dynamic HTML generator.

    Would this architecture be useful for developing the high-level functions further?

  3. Kristian Ovaska reporter

    Here is a specification for anduril.jar low-level commands. Something missing, extraneous or could be improved?

    anduril.jar run-workflow WORKFLOW.jar -d EXECUTION-DIRECTORY
    Run a previously compiled workflow.
    
    anduril.jar compile-bundle (BUNDLE-DIR | BUNDLE-NAME)+
    Generate and compile code for a bundle.
    
    anduril.jar compile-workflow (WORKFLOW.scala | build.sbt) [-s EXTRA-SOURCE.scala]*
    Compile a workflow. If .sbt file is provided, uses that for compilation. Otherwise, -s can be used to include extra source files.
    
    anduril.jar parse-bundle (BUNDLE-DIR | BUNDLE-NAME)+
    Return bundle components in JSON format, including documentation. This command allows moving 'build-doc' out of the core.
    
    anduril.jar parse-workflow WORKFLOW.jar -d EXECUTION-DIRECTORY [--obsolete]
    Return the set of component instances used in the workflow in JSON format, without running. Includes metadata such as component name, current/enabled status, etc.
    For dynamic workflows, uses _state file (if exists) to avoid executing.
    With --obsolete, also returns entries in EXECUTION-DIRECTORY that can be pruned. This command allows moving 'clean' out of the core.
    
    anduril.jar run-test-component -b (BUNDLE-DIR | BUNDLE-NAME)+ -c (COMPONENT-NAME)+ -t (TEST-CASE-NAME)+
    Run component tests.
    
    anduril.jar run-test-workflow -b (BUNDLE-DIR | BUNDLE-NAME)+ -w (TEST-WORKFLOW-NAME)+
    Run workflow tests using previously compiled test workflows. Compilation on demand is encapsulated in higher-level script.
    
    anduril.jar generate-sbt [-b (BUNDLE-DIR | BUNDLE-NAME)] [-s SOURCE.scala]*
    Generate a template SBT that puts anduril.jar and all (or selected) bundle JARs to classpath. Can be used to bootstrap a project.
    
  4. Kristian Ovaska reporter

    parse-bundle and parse-workflow can also return tabulated format as an alternative to JSON, for easier Bash processing.

  5. Ville Rantanen

    should i implement the current anduril command with python, and lets see where it takes us ?

    otherwise, the architecture/structure looks good

  6. Kristian Ovaska reporter

    You can start working on the Python side, and I can start with the Java side. Python calls the anduril.jar low-level functions, so in the beginning you might need some kind of stub anduril.jar implementation for the above commands. We need to make a hg branch since this changes anduril.jar.

    Currently Python can also parse component.xml files, and _state file parsing is also done outside core. Would the above low-level functions remove the need to duplicate this functionality? The idea is that parse-* would produce output that is easy to handle in Python (JSON) and Bash (TSV).

    You mentioned that anduril-component-doc is implemented in Python because Java has performance (startup) overhead. Would it be a problem to implement it using "anduril.jar parse-bundle"? This command could optimize parsing if only one component is needed.

  7. Ville Rantanen

    originally, Python libraries implemented parsing component.xml because it enables python API to know the datatypes of parameters in the component code. From there on, it was a straight forward printing/reformatting of the component.xml in a more readable format.

    And yes, this approach is very fast. And in addition, Anduril2 doesnt have "anduril run-component" which was how docs were printed on console in Anduril1.

    I'll make the anduril script in python, but i'll just call anduril.jar like before: "java -jar [..]"
    Go ahead and branch, I won't commit the python version in the anduril2 branch

  8. Kristian Ovaska reporter

    OK, there is a new branch anduril2-new-cli. When writing the Python stuff, keep in mind that all current commands in anduril.jar will change.

  9. Ville Rantanen

    just realized.. the [bundle]/lib/rc thing we just implemented is quite hard if anduril is python.. .. or perhaps the subprocess to run anduril.jar will have multiple commands like :

    source /path/to/bundle1/lib/rc
    source /path/to/bundle2/lib/rc
    java -jar anduril.jar @@args@@
    

    if you "source" the rc in python, the ENV will not stay to the next subprocess.. one option is of course to change it to python file that gets run...

  10. Kristian Ovaska reporter

    I think having multiple commands is the best solution. Let's not make rc a Python file: it is only an implementation detail in which language 'anduril' happens to be implemented in.

  11. Ville Rantanen

    yep.

    anduril-recompile now calls "anduril compile -b bundlename" to compile the bundle.
    how should this be implemented in the single executable model? directly calling anduril.jar compile in the script?

  12. Ville Rantanen

    Do you have something to replace build-doc with? Most of the lab is still using the HTML based documentation, and a lot of work will have to be done to get similar functionality with the search functions etc..

    I dont think we should deprecate it before we actually have something tangible to replace

  13. Ville Rantanen

    It feels a bit cumbersome to replicate all the switches to run-workflow (.jar) command.. but i guess that is the only command with a huge number of options..

    The nice thing is, that I've managed to make hashbang work directly with the anduril executable

    #!/usr/bin/env anduril
    ....
    
  14. Kristian Ovaska reporter

    I don't have anything to replace build-doc. The notice is an advance warning perhaps. Can be removed if there is no need to work on the doc generator outside core.

    I reduced the switches to run-workflow a bit (various --exec flags).

    I committed parse-bundle. It contains most information about components, including stuff that is not in component.xml (test cases). I also optimized the case of single components (-c COMPONENT-NAME). It runs in 0.5 s on my laptop. Should be fast enough?

  15. Kristian Ovaska reporter

    A better name for install-reqs would actually be install-deps (for dependencies). This is how apt-get etc. call them.

  16. Ville Rantanen

    parse bundle is a nice feature, i think it will be used. it is of course much much slower than the python approach, this command:

    time anduril-component-doc -k java
    

    loads all bundles (anima,builtin,microarray,sequencing,tools (anything in ANDURIL_BUNDLES)), searches for the keyword "java" from component names, and from component.xml <doc> string, then prints the results. Takes less than 0.5 seconds on my laptop.

    Basically the doc printer always has to load all the component names, since it makes a heuristic search for typos and suggests real component names if argument doesn't match 100%.

    These parsers will come in handy anyway.. for the moment, I'd like to keep the python object model, because the pretty printing functions etc are implemented in python..

  17. Ville Rantanen

    Will we still be supporting --exec-mode remote and the --hosts config execution? i don't think anyone is using that, the prefix script basically allows you to do the same thing (although, more manual scripting involved)

  18. Kristian Ovaska reporter

    It's actually on my todo-list to refactor remote execution. Basically, the only exec modes would be local and prefix, and --hosts would be dropped. I might add a couple switches for pre/post-processing scripts. For example, if NFS mounts are not the same on the master node and worker nodes, mapping of file paths is needed. This is now handled by the core, and will be removed, but there might be a --path-mapper SCRIPT.sh flag that allows this when needed.

    For publication and user guide, we would need a few basic prefix scripts, so we can say we support slurm out of the box. These wouldn't have anything fancy like load balancing. For our own needs, we can then use the current solutions.

  19. Kristian Ovaska reporter

    Regarding timing of parse-bundle, it would be easy to support regexp search in addition to plain component names (current -c flag). If the current core plumbing functions are not optimal, you can make tickets that request improvements. I would rather do it like that than do hacks outside the core.

  20. Log in to comment