Unable to save valid Excel workbook using Django admin

Issue #1077 closed
Alex
created an issue

I'm having an issue generating a valid workbook when executing a Django admin command. I've isolated the issue to be specifically when using a command; the issue does not seem to appear if I generate a workbook as a downloadable page in Django.

The issue occurs both when is_write_only is set to True or False, and different behavior occurs upon each. I'm not sure if this is a bug that has occurred in the past and I'm just not setting something properly, however I did try searching across django issues and didn't really find anything. Let me know if I should crosspost this to the Django issues as well.

The issue that occurs:

is_write_only=True Workbook is generated; when opening the workbook Excel generates an error message "We found a problem with some content in 'django_issue.xlsx'. Do you want us to try to recover as much as we can? If you trust the source of this workbook, click Yes.". Clicking yes will recover all data that was contained, however any formulas used (not in the example) will be stripped out. Clicking no will not recover any data.

is_write_only=False Workbook is generated; when opening the workbook Excel generates an error message "We found a problem with some content in 'django_issue.xlsx'. Do you want us to try to recover as much as we can? If you trust the source of this workbook, click Yes.". Clicking yes will recover all data that was contained, however any formulas used (not in the example) will be stripped out. In addition, an additional sheet titled 'Sheet' will appear as the first tab; I think that is expected behavior however as that sheet will only be not generated when using is_write_only=True. Clicking no will not recover any data.

I've copied the minimum code required below to replicate the issue, and if required I can paste the full command that I have if required.

In addition I've attached the workbooks that I generated with this code, but modified the names to give them context. If required, I can also upload a workbook generated from the context of Django when running the server rather than a command (although this workaround is really not efficient for the purposes that I am using it for).

import openpyxl
from django.core.management.base import BaseCommand, CommandError

class Command(BaseCommand):

    def add_arguments(self, parser):
        pass

    def handle(self, *args, **options):
        is_write_only = True # or False

        wb = openpyxl.Workbook(write_only=is_write_only)
        ws = wb.create_sheet('index')
        for i in range(200):
            ws.append([i])
        wb.save(os.path.join(os.getcwd(), 'django_issue.xlsx'))
        wb.close()

Comments (7)

  1. Alex reporter

    So I decided to go ahead and paste the code and upload the file for when running via the server. I'm not going to post all of the code required to generate the django app, but I will paste my urls.py file here that works correctly.

    def _test_export(request):
        from django.http import HttpResponse
        from openpyxl import Workbook
        from openpyxl.writer.excel import save_virtual_workbook
    
        is_write_only = True
        wb = Workbook(write_only=is_write_only)
        ws = wb.create_sheet('index')
        for i in range(200):
            ws.append([i])
        response = HttpResponse()
        response.content = save_virtual_workbook(wb)
        response['Content-Type'] = 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet'
        response['Content-Disposition'] = 'attachment; filename=django_issue.xlsx'
        return response
    
    urlpatterns = [
        url(r'^export/', _test_export, name='export'),
    ]
    
  2. Alex reporter

    I'm also going to go ahead and do a diff on the the django_issue.server.xlsx and django_issue.write_only_true.xlsx. I'll update this post if I can glean any useful information from it.

    Edit: So, upon reviewing the diff, it looks like all but one file is the same. That'd be ./[Content_Types].xml. In django_issue.write_only_true.xlsx, an additional line is being added. <Default ContentType="text/xml" Extension="xml"/> I'm not sure why this is being added. I'm going to update my handle function in the django command to utilize save_virtual_workbook, then save the contents of that to a file and see if that fixes the issue.

    Edit 2 electric boogaloo: So the update didn't work. I'm going to see if instead I can create an in memory zip file, and manipulate the file directly before saving to strip out the conflicting line.

  3. Alex reporter

    Final Update:

    I have a workaround in place to be able to output a xlsx file without the offending line, however I'd still like to see if anyone wants to investigate this issue to fix it. I might take a stab at the internals at some point to see if I can figure it out, but if it works it works right? By the way, if anyone does ever figure this out, please ping me so I can remove my workaround. I haven't fleshed it out into a function yet, so you'll have to forgive my ugly code.

    import openpyxl
    from io import BytesIO
    from zipfile import ZipFile
    from openpyxl.writer.excel import save_virtual_workbook
    from django.core.management.base import BaseCommand, CommandError
    
    class Command(BaseCommand):
    
        def add_arguments(self, parser):
            pass
    
        def handle(self, *args, **options):
            is_write_only = True
    
            wb = openpyxl.Workbook(write_only=is_write_only)
            ws = wb.create_sheet('index')
            for i in range(200):
                ws.append([i])
            b = save_virtual_workbook(wb)
            B_in = BytesIO(b)
            B_out = BytesIO(b'')
            with ZipFile(B_in, mode='r') as zin:
                with ZipFile(B_out, mode='w') as zout:
                    zout.comment = zin.comment
                    for item in zin.infolist():
                        t = zin.read(item.filename).decode('utf-8')
                        if item.filename == '[Content_Types].xml':
                            t = t.replace('<Default ContentType="text/xml" Extension="xml"/>', '')
                        t = bytes(t, 'utf-8')
                        zout.writestr(item, t)
    
            B_out.seek(0)
            with open(os.path.join(os.getcwd(), 'django_issue.xlsx'), mode='wb') as f:
                f.write(B_out.read())
    
            B_in.close()
            B_out.close()
    
  4. CharlieC

    Thanks for the report and the files. The problem is probably related to the configuation of mime-types on the server which openpyxl relies on when creating the package where xml has been given the mime type of text/xml whereas openpyxl explicitly sets application/xml, which is correct as far as I know.

    See if you can fix this as I don't really see it is the responsibility for openpyxl to check and fix this. Otherwise you should be able to replicate the issue and find out where the line is being added by running the test suite on a system with the same configuation. packaging/manifest.py handles all the relevant code.

  5. Alex reporter

    Got it, I'll go ahead and post over in the Django repo to see if they have any ideas as to what could be causing mime types to be different when running it in server mode vs running admin commands. It's a strange issue, to say the least, but I'm glad I at least have a workaround to get it working properly. I'll go ahead and close out this ticket since there's no actual dev work needing to be done on it.

  6. Log in to comment