- changed status to open
- removed comment
Simfactory does not find the right 'path' for symlinked cactus directories
Simfactory currently does not detect the 'right' path to a Cactus sourcetree if this is from within a symlink, e.g. Cactus -> /mnt/data/Cactus. This is because while pwd
returns '/home/user/Cactus', simfactory uses a wrapper to the C-library getcwd(), which dereferences symlinks. Simfactory then gets '/mnt/data/Cactus' and tries to use this path on remote machines when syncing - which of course does not work.
The attached patch fixes this by using os.environ.get("PWD") and, if this is not defined, using os.getcwd() as workaround.
Keyword:
Comments (27)
-
reporter -
- removed comment
Instead of an ad-hoc solution at this one point in the source code, it may be better to introduce a new function to Simfactory that returns the current directory, and which we then use everywhere instead of Python's getcwd. This will ensure that paths are handled consistently everywhere (basedir comes to my mind). libutil.py may be a good place for this.
-
reporter - removed comment
I thought about that. Unfortunately, the PWD solution has the problem that PWD isn't updated when chdir() is called within simfactory. This is usually fine for the Cactus source dir, since that is asked for at the very start of commands. Using PWD in general would require to write a wrapper for chdir() and to consistently use it.
I didn't find a better way to implement this, despite some people having the same problem on the web. The system call pwd() will dereference symlinks no matter what - that is the standard. The other way to get to the symlink-path is using the environment, but python doesn't update that when it changes directories, so os.environ.get("PWD") might actually not point to the current directory.
-
- removed comment
Simfactory doesn't really use "chdir". There are a few chdir commands surrounding calls to external programs (e.g. before calling the runscript), and these should probably be replaced by explicit "cd" commands in the executed commands.
That is, constructs such as
orig = os.getcwd() chdir (somewhere) system "do-something" chdir (orig) should be replaced by
system "cd somewhere && do-somethig" which are shorter, and where there is no chdir involved.
(Using chdir is usually not a good idea, because the current directory is a global state variable, and using chdir is thus equivalent to using a global variable.)
It should thus be safe to use PWD; if not, I would consider this a low-priority bug in Simfactory.
-
- removed comment
Replying to [comment:4 eschnett]:
Simfactory doesn't really use "chdir". There are a few chdir commands surrounding calls to external programs (e.g. before calling the runscript), and these should probably be replaced by explicit "cd" commands in the executed commands.
That is, constructs such as ` orig = os.getcwd() chdir (somewhere) system "do-something" chdir (orig) ` should be replaced by ` system "cd somewhere && do-somethig" ` which are shorter, and where there is no chdir involved.
In these cases, would it not be better to just use the cwd option to popen()?
-
- removed comment
Sure, that would also work (if it works in Python 2.3).
-
- removed comment
One could imagine that in future, SimFactory might want to execute multiple commands in parallel. For example, it might be faster to purge simulations simultaneously. In that case, calling chdir from SimFactory would not be thread-safe. I agree with Erik: the CWD of a process should never be changed.
-
- removed comment
Should this patch be applied? For this release? Presumably it fixes at least one instance of the problem: the one that Frank found.
-
- removed comment
It seems that Frank's patch is appropriate for fixing this problem so I think it should be committed. The fact that SimFactory uses chdir is a separate issue which I agree should be fixed at some stage, but doesn't affect the present issue so it's probably not such a high priority.
-
reporter - removed milestone
- removed comment
After all the comments I actually think that my patch should not be applied for this release. It only fixes part of the problem, and likely hides parts it doesn't fix. For the release we'll have to document that some machines do require manual configuration at the moment, and live with it. After that we should come up with a better way to support it - something we don't have time anymore right now to implement and test.
-
reporter - changed status to open
- changed milestone to ET_2012_05
- removed comment
-
- changed milestone to ET_2012_11
- removed comment
-
reporter - changed milestone to ET_2013_05
- removed comment
-
- removed comment
I think this should not be a release goal.
-
reporter - changed status to open
- removed comment
Actually - I would like to have a review of the original patch again - to include it in the current release. Yes, it is not the ideal patch, but it would be better than no doing anything.
-
- removed comment
I would rather not apply this patch so shortly before the release. It would not receive widespread testing. Given that we use Simfactory also for remote access, for remote login, that there are trampolines, that we use Simfactory is used for running the test suite, that it is called from batch scripts to run simulations etc. all add to the complexity of getting paths correct.
It seems there is an easy work-around, namely modifying sourcebasedir in defs.local.ini.
Are there other instances of os.getcwd in the code that also need changing?
With sufficient testing I would change my mind.
-
reporter - removed milestone
- removed comment
-
reporter - removed comment
Can please someone review this, before we get into the same issue with the release again?
-
- removed comment
Where (on what machines and architectures) was this patch tested?
-
reporter - removed comment
For sure my laptop, quite likely also my workstation (Debian and Redhat). I don't remember others, but that might only be my memory. But then, I never needed it someplace else.
-
- removed comment
Do you want to test it on other systems as well? For example, Kraken/Hopper/Blue Waters/Mike would be important systems.
If I suggest a patch, I usually keep it in my working tree, so that I test it on every system I use.
-
reporter - removed comment
I cannot login right now to Kraken, Hopper or BW, but I tested in on Mike in the sense that it didn't break simfactory execution. However, I don't ever sync from remote machines, and also expect no one to do this, so I didn't test that. I usually don't even forward credentials/agent information for security purposes, which means I couldn't even do this without at least agent forwarding.
-
- removed comment
I do sync remotely at times, because this can be faster than syncing from my laptop.
(I also use agent/credential forwarding, and I do not see any security implications worse than logging in directly.)
Since you only seem to be needing this functionality on a few local systems: can you update your sourcebasedir for these systems instead of using this patch?
-
- removed comment
If you have a private key on machine A, and log in to machine B with agent-forwarding, so that you can then log in to machine C from B, then you give machine B the ability to authenticate using your private key to any other system. If machine B is compromised, this is a security vulnerability which is worse than logging into machine C directly. Note that your key itself is not compromised, as the key never leaves machine A. But the agent on machine A can be instructed by machine B to authenticate data using your private key on machine A for the duration of your session on A. This is why agent-forwarding to machines where you do not 100% trust root (or that root has been compromised) has a higher risk. I do it anyway, but only when I need to; i.e. I don't have agent-forwarding on by default, I use "-A' when I am going to need it.
-
reporter - removed comment
I know about agent forwarding. It's better than other methods, but still not the preferred way as it is also not 100% secure by design.
-
reporter - changed status to open
- removed comment
Anyway - I will test this on other systems and come back once I can report that it works (or not).
-
- changed status to resolved
- removed comment
Tested on several systems.
Applied.
- Log in to comment