revs can be skipped in replay mode, leading to history divergence

Create issue
Issue #348 new
Bryan O'Sullivan created an issue

I've observed this happen several times while cloning a repo. (Pulling would also be affected, since it's the same code.)

An example scenario:

  • Trunk-only clone.
  • In the svn repo, trunk has been accidentally deleted, then restored via {{{svn cp}}}.
  • A directory that existed before trunk was deleted and restored is deleted long after the restore, and this delete is the only event in the rev.
  • The rev is silently skipped while pulling from svn into hg.

I believe the root cause is that hgsubversion is not using replay mode correctly; it's calling {{{ra.get_log}}} to retrieve a list of revs, then {{{ra.replay}}} to replay each.

Alas, when called on a directory below root, {{{ra.get_log}}} will happily not report certain events that relate to files or directories that existed before the delete/restore of that directory. Since {{{ra.get_log}}} doesn't tell us about the revs in question, we never call {{{ra.replay}}} on them, and they never show up.

Interestingly, {{{svnsync}}} does not suffer from this problem, and it uses a slightly different reply mechanism: the entire replay is driven by a call to {{{ra.replay_range}}}.

Comments (6)

  1. Augie Fackler repo owner

    (Reply via

    On May 19, 2012, at 7:41 PM, Bryan O'Sullivan wrote:

    I wonder if we could do enough better by calling ra.get_log over the entire repo rather than our subdir. It'd be annoying to call replay on every rev, as that'd result in transmitting a ton of data we don't need.

  2. Dan Villiom Podlaski Christiansen

    I'm not sure calling get_log() on the entire repository is a good idea, as it would likely cause serious performance regressions when cloning a project from Apache, KDE or similar repositories.

  3. Bryan O'Sullivan reporter

    AFAIK, calling ra.replay_range on a subtree will feed us only the stuff we care about (modulo the usual crop of svn bugs); I believe this is what svnsync does. I think Dan's correct that ra.get_log on the tree as a whole will lead to terrible performance.

  4. domruf

    I'd like to vote for this bug. If there is a way I could help to fix this bug faster please let me know. I'm quiet familiar with SVN but not with its internals.

  5. Log in to comment