CactusTutorial.ipynb contains code only suitable or tutorial server

Issue #2577 new
Roland Haas created an issue

The tutorial notebook is intended for both use inside of the tutorial server (and docker image) as well as users' own jupyer notebook server in their account and finally as a read-only, offline version.

The current notebook contains instances of code that will fail or do dangerous things on a user’s laptop. Namley:

import scrolldown

will fail unless a Python module scrolldown is available. This should be encased in a try:... except ModuleNotFoundError block.

And

cd ~/
rm -fr ~/Cactus
tar xzf ~etuser/Cactus.tar.gz

which is very dangerous since it will remove any Cactus directory in a user’s $HOME which is a bad thing to do for offline use where users may want to use a different directory (eg GW150914 or so). The tarbal will also only exist on the notebook server and thus tar will fail on private jupyter notebook servers or in offline use. This can confuse potential new users that expect to be able to enter each command in the notebook and thus be able to run Cactus.

This cell should be protect by some bash-level if statements that check if it is being run on the notebook server and a comment that this cell will only be used on the notebook server added to the cell as a shell comment (so that it cannot be lost).

Comments (4)

  1. Roland Haas reporter

    Right now the 2nd output that users that download the notebook to their laptop see is an error message:

    and nothing indicates that this is harmless (and I would not expect any new user to read such a note, but instead to just press “CTRL-Enter” all the time).

  2. Gabriele Bozzola

    Having worked a little bit on it to add kuibit, I think that the tutorial should not try to target both the tutorial server and users' machines. To enable this, the notebook comes with plenty of code that is distracting and make things more complicated that they could be. What we could do is have the ipynb for the tutorial server, and export a variation of it as html for general use, where we clarify that the various paths are example and one can customize them (like most “traditional” tutorials).

  3. Roland Haas reporter

    Long post below, sorry.

    I understand the difficulty it creates to try and have the same file present instructions for different environments. This, unfortunately, has always been tricky. ON the other hand, maintaining two different versions, is I suspect not going to be doable (based on what happened in the past).
    The issue with maintaining two almost identical sets of instructions is personpower (e.g., this is why the wiki-based tutorial at https://docs.einsteintoolkit.org/et-docs/Tutorial_for_New_Users is no longer maintained) needed to keep them in sync. At least with one notebook, there is only one file to keep up to date. Exporting to HTML is not the tricky bit I suspect, to me the trickiest parts were always getting things to work in the notebook and when copying an pasting commands to a terminal window.

    Having a mechanical (no human intervention) way to export a read-only, laptop suitable version from the notebook would be very nice. I just do not see how to easily do it.

    The “no human intervention” is very important. So far, each time we had a process requiring human action there has always been the need to double check (so at least review) results. Case in point tends to be our release announcements. Eg the current one on hyperspace (this is just the last time I noticed things, not to single out this particular instance, I can also point to instances where the typo etc is mine) https://hyperspace.uni-frankfurt.de/2021/12/09/new-einstein-toolkit-release-johnson/ reads:

    The highlights of this release include:
    * The inclusion of a new code in the Toolkit release, Kuibit
    * The inclusion of a new code in the Toolkit release, RePrimAn

    that is “RePrimAnd” is missing a “d”. This “d” is present in the template source file present at: https://www.einsteintoolkit.org/about/releases/ET_2021_11_announcement.md but was removed when editing the file for length to include it in the hyperspace mailing.

    Ideally all things that are tutorial server specific should be in the cell labelled “Notebook setup” and everything else should be written in such as way as to work on:

    • the tutorial server
    • the server used for ET new user tutorials at workshops
    • in a docker container on people’s laptops
    • on people’s laptops using jupyter-notebook
    • as a read-only document from which one can copy and paste commands into a terminal

    The latter is somewhat important (for me at least), since real world use of the ET is not (yet) via Jupyter notebooks but via the command line.

    When we asked how people use the tutorial there were a couple of groups that ask new students to run the tutorial notebook instruction on their laptop / clusters to get started with the ET (eg this is how we obtained the quick-running TOV parfile). In that sense our tutorial is more than just a tutorial that lets new users try things out without having to install anything but also a “getting started” document for new ET users in ET using groups.

    Before deciding on spending time changing the target audience of the notebook it may be good to survey the user community again to see how people use the notebook and if their are any issues with it that prevents them from using it more often. Looking back at the mailing list archive, the last such survey may have been as early as 2015: http://lists.einsteintoolkit.org/pipermail/users/2015-April/004070.html so getting an update would be good.

  4. Gabriele Bozzola

    Thank you very much for your taking the time to type the reasoning behind what is being done. It all makes sense.

    My comment comes from putting myself in the shoes of a new user, so the use I had in mind for the notebook is “document to describe the first steps to someone that wants to try out the code”. For this use case, I think that the notebook (primarily the visualization section) obfuscates some points by adding complexity. For instance, when I tell people how to make a visualization of timeseries with kuibit, I tell them “cd into the directory of your data, and run plot_timeseries.py --variable "rho" --reduction "maximum"”, but this is what we have (in the PR at least):

    %%bash
    # See comment before
    export PYTHONUSERBASE="$HOME/Cactus"
    
    # Here we define a bash variable that contains the path where
    # the simulation data lives. We do so to keep this tutorial general.
    # Normally you would just input the path or you would run the
    # script directly from the data directory.
    datadir=$(dirname $(./simfactory/bin/sim get-output-dir tov_ET))
    
    # Plot a timeseries with the maximum of the density
    ./utils/Analysis/kuibit/examples/bins/plot_timeseries.py --datadir $datadir \
    --variable "rho" --reduction "maximum" --outdir $datadir
    
    
    from IPython.display import Image
    
    # datadir is the top-level directory that contains that data for
    # a given simulation. In the first cell of this notebook we define
    # get_datadir() as a helper function to find the path.
    datadir = get_datadir()
    Image(filename=os.path.join(datadir, "rho_maximum.png"))
    

    It may be to clean up so that most of the complexity is hidden somewhere else, but I still think that these cells are not describing how I would really use this code. For example, I would never want to mention PYTHONUSERBASE, and the more comments we put to explain why that is unnecessary in practice, the more we bury important information.

    That said, once again, the main problem is with the visualization section, so maybe we can find a better way to handle that?

  5. Log in to comment