- removed comment
GetComponents has trouble using svn in parallel with recent versions of svn
I receive svn errors when I use
/GetComponents --parallel https://svn.einsteintoolkit.org/manifest/branches/ET_2013_05/einsteintoolkit.th
for the initial checkout, as described in the tutorial. Things work without the --parallel option.
Keyword:
Comments (46)
-
-
reporter - removed comment
After a few seconds, I see this message for a fresh checkout on my laptop:
Checking out module: . from repository: http://svn.cactuscode.org/Utilities/branches/ET_2013_05 into: Cactus as: utils Warning: Could not checkout module . svn: E155037: Previous operation has not finished; run 'cleanup' if it was interrupted
Ben Bernard also sees errors about git repositories.
-
- removed comment
We should find out what causes these problems. It might not be GetComponents after all. I use --parallel all the time and never had these problems. I just tried on a low-latency machine and a high-latency machine and none had any problem. For people that seem to have problems: please make sure there is no previous checkout, and use the following commands to produce a log file for GetComponents (bash):
./GetComponents --verbose --parallel https://svn.einsteintoolkit.org/manifest/branches/ET_2013_05/einsteintoolkit.th > checkout.log 2>&1 svn --version >> checkout.log 2>&1
This logfile will be quite big (1.5MB). Make sure the error occurred, and if so, please compress it ('gzip -9' gets it down to 88kB) and attach it to the ticket.
-
reporter - removed comment
At the moment, I cannot reproduce this error any more.
-
reporter - removed comment
Apparently I can reproduce the error at work (but not at home). Log file attached.
-
- removed comment
Is this all on your laptop, so that it seems to depend on the network it is attached to?
-
reporter - removed comment
Yes, this is all on my laptop.
-
- removed comment
Do you have access to other machines at work where you could try this? Maybe something at this network either tries to limit traffic or cannot cope with parallel connections? Do we have anyone else with this problem?
-
reporter - removed comment
This is the network at Perimeter, it is generally pretty good.
Yes, a similar error was reported on the mailing list, which prompted me to investigate and then confirm this problem.
-
- removed comment
Wasn't that for git repositories? I would expect these to be independent (although we don't know the real problem yet, so we cannot be sure).
Roland said he couldn't confirm the problem with git on queenbee. However, I just now could. The problem is found (git's ever-changing user interface), and patch is currently being tested.
-
- removed comment
Uh actually. While working on this with Frank, I realized that I had compiled my own version of git on QueenBee which is newer than the system provided version (though for different reasons than not being able to check out the ET). So that would explain why it worked for me. Sorry for the confusion.
-
- removed comment
The attached patch works around a change in the git interface. Version 1.6.1.3 (queenbee) doesn't understand the shorthand of
git checkout NAME
for
git checkout -b NAME origin/NAME
yet. So, I changed GetComponents to use the full command, which also works with newer versions of git. Tested and works on queenbee. Please review for backport to ET_2013_05 (and, obviously master as well).
-
- changed version to ET_2013_05
- changed title to GetComponents does not know how to switch branches for old git clients
- changed milestone to ET_2013_05
- removed comment
-
- changed status to open
- removed comment
-
- changed status to open
- removed comment
Patch looks good. Please apply (and backport to Gauss).
-
- changed version to development version
- changed title to GetComponents has trouble using svn in parallel at the Perimeter institute
- changed status to open
- marked as
- removed milestone
- removed comment
Ok, changing this ticket to the problems Erik has. Ideally we would have a separate ticket, but since the problems are already described here and the log is also attached, so what. Setting to 'minor' unless we also see this some place else. Also, please speak up if you have any idea what the problem might be. It's pretty hard to work on this if you don't see the problem yourself.
-
reporter - changed status to open
- removed comment
Here's a solution: Remove "--parallel" from the tutorial.
-
- removed comment
Either --parallel is working or not. I use it all the time (and would like to in the future) and didn't see problems lately. Plus, so far we have this failure only at the PI network and the tutorial talks about Queenbee, where it does work. So really, at the moment I don't see a need to remove --parallel - not from the tutorial and not from GetComponents.
However, I would be interested to find out what goes on at PI. Can you check if running several, independent 'svn co' work in parallel outside of GetComponents? As I see it we need more information on this issue. Something lets subversion abort and it would be nice to find the cause.
-
reporter How to reproduce this:
- svn checkout --non-interactive https://svn.cactuscode.org/flesh/trunk Cactus
- press control-Z while this is running
- cd Cactus
- svn checkout --non-interactive http://svn.cactuscode.org/Utilities/branches/ET_2013_05 utils
Repeat a few times. The warnings and error differ from run to run, and sometimes are
svn: E155037: Previous operation has not finished; run 'cleanup' if it was interrupted
-
- removed comment
Works without a problem for me, every time (svn version 1.6.17). Maybe you should report this to the Subversion project?
-
- removed comment
It could be that this problem is introduced because svn changed from .svn subdirs in every subdir of a checkout to a single .svn at the top of the checkout. When you request a checkout into a target directory, svn very likely checks if this isn't already a checkout. With the old system it only had to check for an existing .svn in that subdirectory. In our case there wasn't any. Now it has to 'go up', however, and might find one which at that time is 'in use', thus the error. If this is indeed how this was introduced, this is not very likely to go away in future versions.
GetComponents has to be able to deal with that. We could either teach it that some versions of svn apparently have this problem, and disable parallel checkouts there (not my favorite, especially since this might mean all versions >=1.7), or we teach GetComponents that there are repositories that live 'within' other repositories and that the 'parents' have to be checked out first and completely before the 'children' can proceed. Right now the only such repo is the flesh.
We could add a new keyword to CRL for this. One such possibility could be adding a !PRIORITY keyword to a section, specifying a number between 0 and 100. Default would be not specified, and thus, 0. Nothing would change in the ET thornlist, except that for the flesh there would be an additional line
PRIORITY = 100
Only components with an identical number would be checked out in parallel.
-
- removed comment
Of course it would be interesting why you see this only on the PI network. A possible reason could be that other networks aren't as fast and don't trigger the race. One reason why I don't see this anywhere could be that I don't use version 1.7 on any system yet.
-
reporter - removed comment
CVS seems to have the same problem, according to a comment in GetComponent's source code. We could apply the same solution, which seems to be to check out into a temporary directory.
-
- removed comment
An interesting idea. But how would you handle the case if one of the thorns is checked out earlier than the flesh, then the flesh finishes and needs to be moved to live in a parent directory of the thorn? You cannot move the whole flesh checkout, since the target already exists (and contains the thorn). You could only copy everything within the flesh-checkout (including .svn), but that also doesn't sound so clean to me.
-
- removed comment
Plus, depending on the location of the temporary checkout relative to the target, the OS might need to copy the contents of .svn file by file instead of just updating the directory inode, resulting in a similar race again.
-
- removed comment
Use git.
-
reporter - removed comment
Disable --parallel, since it's broken.
-
- removed comment
For the moment I added a note to the tutorial. With this taken care of, we can start being serious again and instead of removing a feature when it breaks we could fix it.
-
- removed comment
Question for Ian: did you not also experience some issues with svn and very parallel checkouts where you tried to checkout each folder in a repository individually? Ie. IOUtil/src IOUtil/doc ... ? Speaking of fixes, I don't think that the priority option that Frank suggested is a good solution. While it may work and is simple to implement would seem quite hard to set up and requires a lot of user intervention (as in us having to come up with correct priorities). Instead GetComponents should be able to deduce priorities itself if at all possible.
-
- removed comment
I would also prefer an automatic version of all this. It might not work for generic cases, but for all of the cases we are interested in. The one and only case we are really interested in is the flesh, which is checkout out as '.' into $ROOT. GetComponents would have to create the final path for each component using TARGET, CHECKOUT and possibly NAME, and look for parent/child relationships in there. Then, parents have to be checked out first, and finish before children. Again, the only such relation I see for our cases for be the flesh (and everything else).
-
reporter - changed title to GetComponents has trouble using svn in parallel with recent versions of svn
- removed comment
-
- removed comment
Replying to [comment:29 rhaas]:
Question for Ian: did you not also experience some issues with svn and very parallel checkouts where you tried to checkout each folder in a repository individually? Ie. IOUtil/src IOUtil/doc ... ? Speaking of fixes, I don't think that the priority option that Frank suggested is a good solution. While it may work and is simple to implement would seem quite hard to set up and requires a lot of user intervention (as in us having to come up with correct priorities). Instead GetComponents should be able to deduce priorities itself if at all possible.
SVN changed it's on-disk representation of working copies (see, e.g. http://stackoverflow.com/questions/1364618/how-do-i-determine-svn-working-copy-layout-version) several times. In one of these changes, they switched from having a .svn directory in each subdirectory to having one at the top level (like git). As far as I know, with this change, you can no longer update subdirectories of an SVN repository in parallel (also like git, unfortunately). Additionally, since each checkout creates a new http(s) connection, you might run into the web server's configured limit if you attempt very parallel checkouts.
-
- removed comment
Btw: after reporting the parallel checkout issue upstream to the subversion project last week, a fix was committed today (http://svn.apache.org/viewvc?view=revision&revision=r1501338). This hopefully means that this will be part of a future release (>1.8.0).
-
- removed comment
So for now we just check the svn version then disable parallel if it is 1.7.X? Objections?
-
- removed comment
We could, but version 1.7 (and 1.8.0) are likely to be around for a while. Instead, I propose the attached patch. It doesn't change anything in the CRL syntax. Instead, it sorts the modules by target path with the shortest first (so that the flesh comes first), and does the checkout/update of the very first element in serial. The argument is that the very first element is very likely something containing others, and to avoid the problems with parallel checkouts, the very first isn't done in parallel.
This isn't an ideal patch. Ideally we would checkout everything in another location and move everything 'together' once it is checked out. However, 'update' would have the same problem, and the same strategy wouldn't work there. We also cannot use a similar hack like for git and checkout in a completely different place and just symlink into the checkout, because the symlink would end up inside the containing checkout and might cause confusion there.
Instead, this patch tries to be as short as possible, changing as little as possible. The only user-visible change is the order things are done, but that order isn't guaranteed anyway (with parallel checkouts), and of course the intentional serial checkout of the very first module (with the shortest target path: TARGET.(NAME||CHECKOUT), for each TARGET (and these also sorted by length).
-
- removed comment
We could tie this to specific svn versions as well. I didn't do this here to keep things simple, but I wouldn't object if someone would like to make that distinction.
-
- removed comment
Anyone?
-
- changed status to open
- removed comment
Patch looks ok to me. It's a stopgap measure but might well do the trick, I would suggest adding a routine to check for the affected versions and to output a warning in that case.
-
- changed milestone to ET_2013_11
- removed comment
-
- changed status to open
- removed comment
-
- changed status to open
- removed comment
Again review of both patches?
-
- assigned issue to
- removed comment
-
- removed comment
only the second one.
-
- changed status to open
- removed comment
Following the discussion on the ET phone call today: please apply the first patch. The second patch (version check) is to be dropped.
-
- changed status to resolved
- removed comment
Re-applied in 6dc98e842bad513ddbc5724ebb4f521823574020
-
- edited description
- changed status to closed
- Log in to comment
It currently (as of 5min ago) works for me. Both on queenbee (qb3) and on my workstation. svn version is: 1.1.4 (r13838) on qb3 and 1.6.17 (r1128011) on my workstation.
Can you provide the text of the error message.