Simfactory does not find the right 'path' for symlinked cactus directories

Create issue
Issue #605 closed
Frank Löffler created an issue

Simfactory currently does not detect the 'right' path to a Cactus sourcetree if this is from within a symlink, e.g. Cactus -> /mnt/data/Cactus. This is because while pwd returns '/home/user/Cactus', simfactory uses a wrapper to the C-library getcwd(), which dereferences symlinks. Simfactory then gets '/mnt/data/Cactus' and tries to use this path on remote machines when syncing - which of course does not work.

The attached patch fixes this by using os.environ.get("PWD") and, if this is not defined, using os.getcwd() as workaround.

Keyword:

Comments (28)

  1. Erik Schnetter
    • removed comment

    Instead of an ad-hoc solution at this one point in the source code, it may be better to introduce a new function to Simfactory that returns the current directory, and which we then use everywhere instead of Python's getcwd. This will ensure that paths are handled consistently everywhere (basedir comes to my mind). libutil.py may be a good place for this.

  2. Frank Löffler reporter
    • removed comment

    I thought about that. Unfortunately, the PWD solution has the problem that PWD isn't updated when chdir() is called within simfactory. This is usually fine for the Cactus source dir, since that is asked for at the very start of commands. Using PWD in general would require to write a wrapper for chdir() and to consistently use it.

    I didn't find a better way to implement this, despite some people having the same problem on the web. The system call pwd() will dereference symlinks no matter what - that is the standard. The other way to get to the symlink-path is using the environment, but python doesn't update that when it changes directories, so os.environ.get("PWD") might actually not point to the current directory.

  3. Erik Schnetter
    • removed comment

    Simfactory doesn't really use "chdir". There are a few chdir commands surrounding calls to external programs (e.g. before calling the runscript), and these should probably be replaced by explicit "cd" commands in the executed commands.

    That is, constructs such as

    orig = os.getcwd() chdir (somewhere) system "do-something" chdir (orig) should be replaced by

    system "cd somewhere && do-somethig" which are shorter, and where there is no chdir involved.

    (Using chdir is usually not a good idea, because the current directory is a global state variable, and using chdir is thus equivalent to using a global variable.)

    It should thus be safe to use PWD; if not, I would consider this a low-priority bug in Simfactory.

  4. Barry Wardell
    • removed comment

    Replying to [comment:4 eschnett]:

    Simfactory doesn't really use "chdir". There are a few chdir commands surrounding calls to external programs (e.g. before calling the runscript), and these should probably be replaced by explicit "cd" commands in the executed commands.

    That is, constructs such as ` orig = os.getcwd() chdir (somewhere) system "do-something" chdir (orig) ` should be replaced by ` system "cd somewhere && do-somethig" ` which are shorter, and where there is no chdir involved.

    In these cases, would it not be better to just use the cwd option to popen()?

  5. Ian Hinder
    • removed comment

    One could imagine that in future, SimFactory might want to execute multiple commands in parallel. For example, it might be faster to purge simulations simultaneously. In that case, calling chdir from SimFactory would not be thread-safe. I agree with Erik: the CWD of a process should never be changed.

  6. Ian Hinder
    • removed comment

    Should this patch be applied? For this release? Presumably it fixes at least one instance of the problem: the one that Frank found.

  7. Barry Wardell
    • removed comment

    It seems that Frank's patch is appropriate for fixing this problem so I think it should be committed. The fact that SimFactory uses chdir is a separate issue which I agree should be fixed at some stage, but doesn't affect the present issue so it's probably not such a high priority.

  8. Frank Löffler reporter
    • removed milestone
    • removed comment

    After all the comments I actually think that my patch should not be applied for this release. It only fixes part of the problem, and likely hides parts it doesn't fix. For the release we'll have to document that some machines do require manual configuration at the moment, and live with it. After that we should come up with a better way to support it - something we don't have time anymore right now to implement and test.

  9. Frank Löffler reporter
    • changed status to open
    • removed comment

    Actually - I would like to have a review of the original patch again - to include it in the current release. Yes, it is not the ideal patch, but it would be better than no doing anything.

  10. Erik Schnetter
    • removed comment

    I would rather not apply this patch so shortly before the release. It would not receive widespread testing. Given that we use Simfactory also for remote access, for remote login, that there are trampolines, that we use Simfactory is used for running the test suite, that it is called from batch scripts to run simulations etc. all add to the complexity of getting paths correct.

    It seems there is an easy work-around, namely modifying sourcebasedir in defs.local.ini.

    Are there other instances of os.getcwd in the code that also need changing?

    With sufficient testing I would change my mind.

  11. Frank Löffler reporter
    • removed comment

    Can please someone review this, before we get into the same issue with the release again?

  12. Frank Löffler reporter
    • removed comment

    For sure my laptop, quite likely also my workstation (Debian and Redhat). I don't remember others, but that might only be my memory. But then, I never needed it someplace else.

  13. Erik Schnetter
    • removed comment

    Do you want to test it on other systems as well? For example, Kraken/Hopper/Blue Waters/Mike would be important systems.

    If I suggest a patch, I usually keep it in my working tree, so that I test it on every system I use.

  14. Frank Löffler reporter
    • removed comment

    I cannot login right now to Kraken, Hopper or BW, but I tested in on Mike in the sense that it didn't break simfactory execution. However, I don't ever sync from remote machines, and also expect no one to do this, so I didn't test that. I usually don't even forward credentials/agent information for security purposes, which means I couldn't even do this without at least agent forwarding.

  15. Erik Schnetter
    • removed comment

    I do sync remotely at times, because this can be faster than syncing from my laptop.

    (I also use agent/credential forwarding, and I do not see any security implications worse than logging in directly.)

    Since you only seem to be needing this functionality on a few local systems: can you update your sourcebasedir for these systems instead of using this patch?

  16. Ian Hinder
    • removed comment

    If you have a private key on machine A, and log in to machine B with agent-forwarding, so that you can then log in to machine C from B, then you give machine B the ability to authenticate using your private key to any other system. If machine B is compromised, this is a security vulnerability which is worse than logging into machine C directly. Note that your key itself is not compromised, as the key never leaves machine A. But the agent on machine A can be instructed by machine B to authenticate data using your private key on machine A for the duration of your session on A. This is why agent-forwarding to machines where you do not 100% trust root (or that root has been compromised) has a higher risk. I do it anyway, but only when I need to; i.e. I don't have agent-forwarding on by default, I use "-A' when I am going to need it.

  17. Frank Löffler reporter
    • removed comment

    I know about agent forwarding. It's better than other methods, but still not the preferred way as it is also not 100% secure by design.

  18. Frank Löffler reporter
    • changed status to open
    • removed comment

    Anyway - I will test this on other systems and come back once I can report that it works (or not).

  19. Log in to comment