pep381run refetches too much
I'm a fairly new user of pep381client, running an in-house mirror, so maybe I missed something.
I've noticed that the client seems to fetch again all versions of a given distribution each time there's a new version. Of course this is quite undesireable (all versions for projects with long history can take hours to download, and this stresses pypi a lot).
I don't know if this is systematic, but it seems to be because currently, PyPI does not provide Etags, while the caching logic in maybe_copy_file() apparently relies on them.
(extract from maybe_copy_file())
etag = self.storage.etag(path) if etag: h.putheader("If-none-match", etag) h.endheaders() r = h.getresponse() if r.status == 304: # not modified, discard data r.read() return
here's curl session showing the lack of Etag:
$ curl -vO http://pypi.python.org/packages/source/G/GeoBases/GeoBases-4.23.0.zip * About to connect() to pypi.python.org port 80 (#0) * Trying 22.214.171.124... * Connected to pypi.python.org (126.96.36.199) port 80 (#0) > GET /packages/source/G/GeoBases/GeoBases-4.23.0.zip HTTP/1.1 > User-Agent: curl/7.26.0 > Host: pypi.python.org > Accept: */* > * additional stuff not fine transfer.c:1037: 0 0 * HTTP 1.1 or later with persistent connection, pipelining supported < HTTP/1.1 200 OK < Server: nginx/1.1.19 < Date: Mon, 22 Apr 2013 15:43:47 GMT < Content-Type: application/zip < Content-Length: 10396883 < Last-Modified: Fri, 08 Feb 2013 16:33:37 GMT < Accept-Ranges: bytes <
Also, there's a potential bug in case an Etag would be provided : etag is fetched from the local "file" DB before the leading '/' is stripped, but written after. I couldn't check the effectivity of that potential bug, 'cause my local "files" DB hasn't any Etag,