1. Augie Fackler
  2. hgsubversion
  3. Issues
Issue #107 new

Connects once per changeset for ssh clones

Anonymous created an issue

Today, I cloned a Subversion repository with hgsubversion at work. Afterwards, I was contacted by one of the admins why I connected once a second.

It seems that hgsubversion is doing one SSH connection per SVN revision. IMHO that is suboptimal, especially hampering speed.

Comments (6)

  1. Shun-ichi Goto

    With latest hgsubversion, I found a bug related to this issue.

    Because SubversionRepo.init_ra_and_client() is called in every action, ra.open2() is called many time. The ra session should be reused and opened once. This patch is a easy and temporary one, but it fix the issue.

    --- a/hgsubversion/svnwrap/svn_swig_wrapper.py	Thu Mar 25 14:47:16 2010 +0900
    +++ b/hgsubversion/svnwrap/svn_swig_wrapper.py	Thu Mar 25 19:40:20 2010 +0900
    @@ -196,6 +196,7 @@
             self.auth_baton = _create_auth_baton(self.auth_baton_pool)
             # self.init_ra_and_client() assumes that a pool already exists
             self.pool = core.Pool()
    +        self.ra = None
             self.uuid = ra.get_uuid(self.ra, self.pool)
    @@ -211,6 +212,8 @@
             unified diffs runs the remote server out of open files.
             # while we're in here we'll recreate our pool
    +        if self.ra:
    +            return
             self.pool = core.Pool()
             if self.username:
    @@ -237,7 +240,7 @@
                 v = ra.version()
                 self.svn_version = (v.major, v.minor, v.patch)
             except core.SubversionException, e:
    -            raise hgutil.Abort(e.args[0])
    +            raise hgutil.Abort(e.args[0] or 'SubversionException:%d' % e.args[1])

    Without this patch, clone'ing big repository (like freebsd repo) stops due to SVN_ERR_RA_CONNECTION_CLOSE error in some handreds revisions. With this patch, I success over 8000 revs (and still running). (hg clone svn:svn.freebsd.org/base/head freebsd-head-hg)

    B.T.W., 3rd hunk is for making error message by error number instead of None.

  2. Augie Fackler repo owner

    Just responded to this via the list, but i'll repeat my comment here:

    This code needs to be significantly more clever I'm afraid. If
    the server doesn't support replay, then we need to allow the
    connection to be reopened, as the diffing method leaks file
    descriptors. I've actually run whole servers out of file
    descriptors before adding the connection reopening. Maybe the
    code can check and see if the server supports replay (there's an
    attribute for that) and if it's over ssh (check the url scheme?)
    and only re-init the connection if it's an older server over ssh?
    Also, what does this do to our memory use? It doesn't explode
    higher? Blowing away the pool *should* be controlling at least
    some of the leaks...
  3. Anonymous

    A simpler workaround i've been using is to use the -r switch to clone to get as far into the cloning as possible and then pulling after that.

    Worked with one clone, one pull for my work repo that was having this issue, very large repos might take a few incremental pulls i guess.

  4. Log in to comment