Issue #524 resolved

[intersphinx] corrupt objects.inv due to zlib issue on windows

Anonymous created an issue

I am getting the following error when trying to use a generated objects.inv with intersphinx:

build succeeded, 1 warning.

WARNING: intersphinx inventory 'D:\\mrec3/sdspec/builtsrc/inmsx/html/objects.inv' not readable due to error: Error -3 while decompressing: invalid distance too far back

This error appears to be coming from zlib.

I would like to attach the offending file, and/or the project which generates it, but it is documentation for our internal proprietary C API for one of our core technologies at work.

I can not find an option to save the inventory in cleartext which would be an acceptable workaround for me.

Comments (10)

  1. Anonymous

    After doing some more debugging, the issue appears to be with the chunk size being used. The inventory file I have is larger than the 16K (16*1024) chunk size used for reading in the compressed data. Now the decompressor should be properly managing it's internal buffer so the chunk size should not be an issue, but it is (python2.6). If I by hand read in the entire file into memory (after the text header) I can decompress everything in one go. But if it is chunked, it crashes.

    I have implemented a temporary workaround by writing a version 2 to 1 inventory file converter, and converting all our object.inv file back to version 1 as part of our build system. This is.... well it sucks, but it does work.

    I do not see any bugs about zlib having this issue on the python roundup tracker.

  2. Doug Napoleone

    Found the bug!!!

    The problem is the inventory file is being opened in text mode, not binary mode:

    def fetch_inventory(app, uri, inv):
        """Fetch, parse and return an intersphinx inventory file."""
        # both *uri* (base URI of the links to generate) and *inv* (actual
        # location of the inventory file) can be local or remote URIs
        localuri = uri.find('://') == -1
        join = localuri and path.join or posixpath.join
            if inv.find('://') != -1:
                f = urllib2.urlopen(inv)
                f = open(path.join(app.srcdir, inv))

    That end 'open' should be "f = open(path.join(app.srcdir, inv), 'rb')" The problem here is because the file is being opened in text mode, the binary data contains an EOF marker (windows in their infinite wisdom decided to have a character which can represent this so you can have data in a file after the EOF... yea windows!!!) So the read() later on truncates on this binary marker.

    The problem here is that the header IS in text and will have a different EOL marker between windows and linux/mac. If you read in binary you need to deal with the potential for '\r' markers. I think this will also be a problem for URL based inventory files.

    Note how the inventory file is written in the builders/

        def dump_inventory(self):
  'dumping object inventory... '), nonl=True)
            f = open(path.join(self.outdir, INVENTORY_FILENAME), 'wb')
  3. Doug Napoleone

    After looking things over just adding the 'rb' should do the trick and the urlopened files should be fine. I can test the urlopen issue, but I might not get to it today. I will try. Thanks for the fast response!

  4. Doug Napoleone

    Sorry it took so long to do this simple test. I tested the objects.inv which was causing me problems on windows using apache servers running on both windows and REL, from remote machines running win64, OSX, and REL. There were no problems decompressing the data, and urlopen is the equivalent of a read-binary.

  5. Log in to comment