1. Augie Fackler
  2. hg-git
Issue #108 resolved

Cannot clone Github HTTPS repositories

kankri
created an issue

hg-git has stopped working with Github:

>hg clone git+https://github.com/jayKayEss/Flapper
destination directory: Flapper
** unknown exception encountered, please report by visiting
** http://mercurial.selenic.com/wiki/BugTracker
** Python 2.7.3 (default, Apr 10 2012, 23:24:47) [MSC v.1500 64 bit (AMD64)]
** Mercurial Distributed SCM (version 2.8.1)
** Extensions loaded: hggit
Traceback (most recent call last):
  File "hg", line 42, in <module>
  File "mercurial\dispatch.pyo", line 28, in run
  File "mercurial\dispatch.pyo", line 69, in dispatch
  File "mercurial\dispatch.pyo", line 133, in _runcatch
  File "mercurial\dispatch.pyo", line 806, in _dispatch
  File "mercurial\dispatch.pyo", line 585, in runcommand
  File "mercurial\dispatch.pyo", line 897, in _runcommand
  File "mercurial\dispatch.pyo", line 868, in checkargs
  File "mercurial\dispatch.pyo", line 803, in <lambda>
  File "mercurial\util.pyo", line 512, in check
  File "mercurial\commands.pyo", line 1286, in clone
  File "mercurial\hg.pyo", line 372, in clone
  File "mercurial\localrepo.pyo", line 2431, in clone
  File "hg-git\hggit\hgrepo.py", line 14, in pull
    return self.githandler.fetch(remote.path, heads)
  File "hggit\git_handler.py", line 202, in fetch
    refs = self.fetch_pack(remote, heads)
  File "hggit\git_handler.py", line 1019, in fetch_pack
    ret = client.fetch_pack(path, determine_wants, graphwalker, f.write, progress.progress)
  File "dulwich\client.pyo", line 1001, in fetch_pack
  File "dulwich\client.pyo", line 924, in _discover_references
  File "dulwich\client.pyo", line 902, in _http_request
dulwich.errors.NotGitRepository

It is unfortunately not easy to log what requests hg-git (or dulwich) is doing. However, it is extremely easy to see what Git is doing:

>set GIT_CURL_VERBOSE=1
>git clone https://github.com/jayKayEss/Flapper
...
> GET /jayKayEss/Flapper/info/refs?service=git-upload-pack HTTP/1.1
User-Agent: git/1.7.8.msysgit.0
Host: github.com
Accept: */*
Pragma: no-cache

< HTTP/1.1 200 OK
< Server: GitHub Babel 2.0
< Content-Type: application/x-git-upload-pack-advertisement
< Transfer-Encoding: chunked
< Expires: Fri, 01 Jan 1980 00:00:00 GMT
< Pragma: no-cache
< Cache-Control: no-cache, max-age=0, must-revalidate
< Vary: Accept-Encoding
...

Trying to get the same URL with curl:

>curl --head "https://github.com/jayKayEss/Flapper/info/refs?service=git-upload-pack"
HTTP/1.1 404 Not Found
Server: GitHub.com
Date: Wed, 16 Apr 2014 20:50:21 GMT
Content-Type: text/html; charset=utf-8
Status: 404 Not Found
Cache-Control: no-cache
X-XSS-Protection: 1; mode=block
X-Frame-Options: deny
Content-Length: 226672
X-GitHub-Request-Id: 5870E8CF:2231:EA2F13:534EED0D
Strict-Transport-Security: max-age=31536000
X-Content-Type-Options: nosniff

After some head scratching I tried better simulating the Git request:

>curl --head -A "git/1.7.8.msysgit.0" "https://github.com/jayKayEss/Flapper/info/refs?service=git-upload-pack"
HTTP/1.1 200 OK
Server: GitHub Babel 2.0
Content-Type: application/x-git-upload-pack-advertisement
Transfer-Encoding: chunked
Expires: Fri, 01 Jan 1980 00:00:00 GMT
Pragma: no-cache
Cache-Control: no-cache, max-age=0, must-revalidate
Vary: Accept-Encoding

So the User-Agent header seems to make a difference!

Looking at the dulwich sources and simulating its request seems to confirm the theory:

>curl --head -A "dulwich/0.9.6" "https://github.com/jayKayEss/Flapper/info/refs?
service=git-upload-pack"
HTTP/1.1 404 Not Found
Server: GitHub.com
Date: Wed, 16 Apr 2014 20:57:32 GMT
Content-Type: text/html; charset=utf-8
Status: 404 Not Found
Cache-Control: no-cache
X-XSS-Protection: 1; mode=block
X-Frame-Options: deny
Content-Length: 226672
X-GitHub-Request-Id: 5870E8CF:0AB7:10EE616:534EEEBB
Strict-Transport-Security: max-age=31536000
X-Content-Type-Options: nosniff

This seems a bit brain-dead to me, but I guess this problem needs to be fixed in dulwich by setting a better User-Agent header...

However:

  1. It would be nice to have something like GIT_CURL_VERBOSE for hg-git (and dulwich) to debug issues like this.

    urllib2 supports logging requests with:

    http_logger = urllib2.HTTPHandler(debuglevel = 1)
    opener = urllib2.build_opener(http_logger)
    
  2. It would be nice to be able to control settings like User-Agent from a configuration file.

Comments (10)

  1. kankri reporter

    Looking closer, it turns out dulwich already provides with an API which allows to override the User-Agent easily. The following change to hg-git made it work for me with Github:

    diff --git a/hggit/git_handler.py b/hggit/git_handler.py
    --- a/hggit/git_handler.py
    +++ b/hggit/git_handler.py
    @@ -1402,6 +1402,7 @@
                 else:
                     auth_handler = urllib2.HTTPBasicAuthHandler(AuthManager(self.ui))
                     opener = urllib2.build_opener(auth_handler)
    +                opener.addheaders = [('User-agent', 'git/0')]
                     try:
                         return client.HttpGitClient(uri, opener=opener, thin_packs=False), uri
                     except TypeError as e:
    
  2. kankri reporter

    To enable request/response logging for HTTPS you can do this:

    diff --git a/hggit/git_handler.py b/hggit/git_handler.py
    --- a/hggit/git_handler.py
    +++ b/hggit/git_handler.py
    @@ -1401,7 +1401,8 @@
                     raise RepoError('git via HTTP requires dulwich 0.8.1 or later')
                 else:
                     auth_handler = urllib2.HTTPBasicAuthHandler(AuthManager(self.ui))
    -                opener = urllib2.build_opener(auth_handler)
    +                http_logger = urllib2.HTTPSHandler(debuglevel=1)
    +                opener = urllib2.build_opener(auth_handler, http_logger)
                     opener.addheaders = [('User-agent', 'git/0')]
                     try:
                         return client.HttpGitClient(uri, opener=opener, thin_packs=False), uri
    
  3. Augie Fackler repo owner

    There's no real bug here: the foo.git URL form works as it always has on github.

    Claiming we're git in the user-agent string seems Bad and Wrong. I'm open to other proposals if anyone has one, but likely the solution is we need to hassle github to do something smarter that's not based on user-agents.

  4. Kent F

    I'm using the latest TortoiseHg and latest hggit version from the hggit repo, and I'm getting this very same error when cloning Git repositories here on Bitbucket.

    From the command line:

    C:\bucket> hg clone git+https://bitbucket.org/cmsc433_spring2012/project0.git project0
    ** Unknown exception encountered with possibly-broken third-party extension hggit
    ** which supports versions 2.8.1 of Mercurial.
    ** Please disable hggit and try your action again.
    ** If that fixes the bug please report it to https://bitbucket.org/durin42/hg-git/issues
    ** Python 2.7.6 (default, Nov 10 2013, 19:24:24) [MSC v.1500 64 bit (AMD64)]
    ** Mercurial Distributed SCM (version 2.9.2)
    ** Extensions loaded: strip, mq, convert, hgk, purge, hggit
    Traceback (most recent call last):
      File "hg", line 42, in <module>
      File "mercurial\dispatch.pyo", line 28, in run
      File "mercurial\dispatch.pyo", line 69, in dispatch
      File "mercurial\dispatch.pyo", line 138, in _runcatch
      File "mercurial\dispatch.pyo", line 810, in _dispatch
      File "mercurial\dispatch.pyo", line 590, in runcommand
      File "mercurial\dispatch.pyo", line 901, in _runcommand
      File "mercurial\dispatch.pyo", line 872, in checkargs
      File "mercurial\dispatch.pyo", line 807, in <lambda>
      File "mercurial\util.pyo", line 511, in check
      File "mercurial\commands.pyo", line 1314, in clone
      File "mercurial\hg.pyo", line 382, in clone
      File "mercurial\localrepo.pyo", line 2402, in clone
      File "E:\codes2\hggit\hggit\hgrepo.py", line 14, in pull
        return self.githandler.fetch(remote.path, heads)
      File "E:\codes2\hggit\hggit\git_handler.py", line 202, in fetch
        refs = self.fetch_pack(remote, heads)
      File "E:\codes2\hggit\hggit\git_handler.py", line 1019, in fetch_pack
        ret = client.fetch_pack(path, determine_wants, graphwalker, f.write, progress.progress)
      File "dulwich\client.pyo", line 1001, in fetch_pack
      File "dulwich\client.pyo", line 924, in _discover_references
      File "dulwich\client.pyo", line 902, in _http_request
    dulwich.errors.NotGitRepository
    

    Clones from Github works perfectly:

    C:\bucket> hg clone git+https://github.com/bulletphysics/bullet3.git bullet3
    importing git objects into hg
    updating to branch default
    1537 files updated, 0 files merged, 0 files removed, 0 files unresolved
    

    Is it just me, or do other have these problems?

  5. kankri reporter

    There's no real bug here: the foo.git URL form works as it always has on github.

    It took me a while to realize that my problem stemmed from using the Subversion URL:

    https://github.com/jayKayEss/Flapper
    

    instead of the Git URL:

    https://github.com/jayKayEss/Flapper.git
    

    I have been accustomed to just go and copy the clone URL from the projects' Github page and haven't noticed that at some point Github has started to show the Subversion flavor of the URL by default. Since the text box is so narrow, you don't really see the difference.

    To make things worse, it seems Github has some User-Agent based logic to respond with Git protocol even at the Subversion URL if the client looks like git. Because git works, but hg doesn't, it's not easy to notice you are using the wrong URL. I know several people who had been using hg-git but started using plain git with Github based on the observation that hg-git "didn't work" any more.

    This is of course because of Github UI stupidity, but it's sad to see people stop using hg-git because of that. I would be OK with faking the User-Agent string in the same way browsers claim to be Mozilla, Gecko, KHTML and Safari at the same time, ugly as it is. I wonder if an alternative would be to try to append ".git" to a URL if the original one results in an error?

  6. kankri reporter

    Kent F: I get the same error. Using my logging patch I see this:

    >hg clone git+https://bitbucket.org/cmsc433_spring2012/project0.git
    destination directory: project0
    send: 'GET /cmsc433_spring2012/project0.git/info/refs?service=git-upload-pack HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: bitbucket.org\r\nContent-Type: application/x-git-upload-pack-request\r\nConnection: close\r\nUser-Agent: Python-urllib/2.7\r\n\r\n'
    reply: 'HTTP/1.1 404 NOT FOUND\r\n'
    header: Server: nginx/1.5.10
    header: Date: Wed, 21 May 2014 20:21:42 GMT
    header: Content-Type: text/html; charset=utf-8
    header: Content-Length: 28032
    header: Connection: close
    header: X-Served-By: app16
    header: X-Render-Time: 0.254558801651
    header: Content-Language: en
    header: X-Static-Version: dad0b32d9c85
    header: Vary: Accept-Language, Cookie
    header: X-Version: dad0b32d9c85
    header: ETag: "70050d240bced4e83db3d54e957b4129"
    header: X-Request-Count: 339
    header: X-Frame-Options: SAMEORIGIN
    ...
      File "dulwich\client.pyo", line 1014, in fetch_pack
      File "dulwich\client.pyo", line 937, in _discover_references
      File "dulwich\client.pyo", line 925, in _http_request
    dulwich.errors.NotGitRepository
    

    Adding the User-Agent patch the command succeeds:

    >hg clone git+https://bitbucket.org/cmsc433_spring2012/project0.git
    destination directory: project0
    send: 'GET /cmsc433_spring2012/project0.git/info/refs?service=git-upload-pack HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: bitbucket.org\r\nContent-Type: application/x-git-upload-pack-request\r\nConnection: close\r\nUser-Agent: git/0\r\n\r\n'
    reply: 'HTTP/1.1 200 OK\r\n'
    header: Server: nginx/1.5.10
    header: Date: Wed, 21 May 2014 20:30:11 GMT
    header: Content-Type: application/x-git-upload-pack-advertisement
    header: Transfer-Encoding: chunked
    header: Connection: close
    send: 'POST /cmsc433_spring2012/project0.git/git-upload-pack HTTP/1.1\r\nAccept-Encoding: identity\r\nContent-Length: 116\r\nHost: bitbucket.org\r\nContent-Type: application/x-git-upload-pack-request\r\nConnection: close\r\nUser-Agent: git/0\r\n\r\n0067want ade124358d3aacde8a6dbea0aa57e0e4a0135ff3 multi_ack side-band-64k multi_ack_detailed ofs-delta\n00000009done\n'
    reply: 'HTTP/1.1 200 OK\r\n'
    header: Server: nginx/1.5.10
    header: Date: Wed, 21 May 2014 20:30:12 GMT
    header: Content-Type: application/x-git-upload-pack-result
    header: Transfer-Encoding: chunked
    header: Connection: close
    importing git objects into hg
    updating to branch default
    7 files updated, 0 files merged, 0 files removed, 0 files unresolved
    

    So here even the ".git" suffix doesn't help and the URL should be correct.

    Update: Setting User-Agent: hg-git/0.6.0 makes cloning succeed with Bitbucket as well and to me feels even better than reporting Python-urllib/2.7.

  7. kankri reporter

    I made pull request #30 to change the User-Agent to hg-git/0.6.0. I think it's a good practice to have a descriptive user agent string and it doesn't hurt that it helps also in the problems listed in this issue.

  8. Augie Fackler repo owner

    I've mailed the list a link to that PR to attempt to foster discussion. It looks reasonable to me, but I feel like there's something surprising lurking in this area...

  9. Log in to comment