Pivot table changed to plain text after saving file

Issue #295 resolved
Bartev Vartanian
created an issue

I have an excel file (.xlsx) with 2 tabs, one with a pivot table, the other with data in a named range. I'd like to be able to programatically change the data in the data tab so that when the file is opened, the pivot table can be updated. However, when I save a file using openpyxl, the pivot table is destroyed.

Here is the code I'm running

#!/usr/bin/env python

import openpyxl as px
from openpyxl import load_workbook

# Open excel file
filename = 'wbk1.xlsx'
wb = load_workbook(filename=filename)
wb.save(filename)

Comments (35)

  1. Bartev Vartanian reporter

    So if the spreadsheet has a pivot table, it's supposed to be converted to regular text when saving the file? I'm not trying to create one, just updating the data it's based on.

  2. CharlieC

    Well, actually we just ignore the definitions and keep only the data in the individual worksheets themselves. If you have a look at the Excel file (it's a zip file so simply rename it and open it) you'll see.

    There are two possible approaches:

    • implement support for Pivot Tables - from a brief look at the code this doesn't look like too much work but more than I expect to have time for this year.

    • treat pivot tables like Macros and at least preserve them. This might be doable for version 2 or even 1.9 if a pull request is submitted.

  3. Daniel Hallman

    I definitely need this feature. In fact, I'm looking at other solutions in order to preserve our existing Pivot Tables as its a show stopper. We just want to edit one sheet and preserve the others. Otherwise, openpyxl has been absolutely awesome and easy to use.

  4. Bartev Vartanian reporter

    My solution was to have a template spreadsheet which has a macro that creates the pivot tables. My python code creates a copy of the template spreadsheet and adds the data to a new worksheet, then saves the workbook. When you open the new workbook, you press a button that calls the macro that creates the pivot tables.

  5. Arun Marathe

    Hi Bartev,

    We are trying to do something similar. Could you tell whether your template spreadsheet with a macro had extension "xltm" or "xlsm" (or something else?)

    Thanks.

  6. Arun Marathe

    Thanks!

    Currently, I am just copying the template file "a.xlsm" to another file "b.xlsm", opening "b.xlsm" using load_workbook, and then saving the workbook using 'save'. The resulting file "b.xlsm" gets corrupted in that it cannot be opened in Excel 2010. It can be opened using libreoffice on Linux, however.

    Any ideas?

    Please note that eventually, I would like to write some real data to "b.xlsm", but because simply loading and saving makes it corrupt, I am not sure how to proceed. The file "a.xlsm" has a simple pivot table (defined using a macro) on one worksheet that is defined in terms of 3 rows of data on a second worksheet. Am I doing something wrong?

    Thanks.

  7. Arun Marathe

    Hi Charlie, Thanks for the tip. Now the file-corrupt problem is resolved. I used keep_vba=True in load_workbook as suggested.

    Now named-range creation is causing problems. My creation code looks like:

    nr = w_book.create_named_range('RAW_DATA', details_sheet, details_sheet['A3': 'F894'])

    Is that how to specify a named range?

    The exception is raised from workbook save. Message is: 'generator' object has no attribute 'upper'

    Thanks.

  8. CharlieC

    The syntax is wrong.

    So you have the scope wrong. details_sheet['A3':'F894'] is not a reference but returns a range of cells as a generator. In this case you probably need:

    nr = w_book.create_named_range('RAW_DATA', details_sheet, 'A3:F894')

    This is from the test suite.

    def test_add_named_range():
        wb = Workbook()
        new_sheet = wb.create_sheet()
        named_range = NamedRange('test_nr', [(new_sheet, 'A1')])
        wb.add_named_range(named_range)
        named_ranges_list = wb.get_named_ranges()
        assert named_range in named_ranges_list
    
  9. Arun Marathe

    Thanks; that works!

    Is there a way to determine the number of last row written? I am keeping track of that using a counter for now, but may consider using a built-in function if it exists.

  10. Alexander Witte

    Hey Charlie!

    Just want to confirm that as of now, saving an xlsx file that has pivot tables in it through openpyxl will result in an corrupted file? (that's what im experiencing anyway).

    Thanks!

    Alex

  11. CharlieC

    @alexsambacanada I wouldn't expect that to be related to pivot tables. It's probably something else. The sample file on the issue doesn't have any problems.

    People occasionally express an interest in support but it's going to be a lot of work and will need sponsoring.

  12. stefanzhelev

    Hi Charlie, support for pivot tables is a show stopper. Can you provide a budget for including it in the next openpyxl release (time and money)? Thanks for the excellent work so far!

  13. WillM

    I would be willing to pitch in on money as well for the Pivot Table addition. Let me know what I can do to help out <username>insights@gmail.com

  14. stefanzhelev

    Below is the code for a workaround that I am using. It works on Windows 7 with Python 3.5.2. It uses the win32 package in combination with VBA, to open a predefined xlsx template 'Dashboard_template.xlsx' with pivot tables (configured to refresh upon file update), insert the data to a sheet 'Sheet1' and save/close. Note that you might have to adapt the code for your needs.

    import win32com.client as win32
    import pandas
    from pandas import DataFrame
    
    
    # obtain data from csv
    bmt_customer_data = pd.read_csv('../data/example.csv')
    
    # start excel application
    excel = win32.gencache.EnsureDispatch('Excel.Application')
    excel.Visible = True
    
    # open excel template file
    wb = excel.Workbooks.Open("path/Dashboard_template.xlsx")
    
    # import
    ws_bmt = wb.Worksheets("Sheet1")
    # set the import range to text formatting in order to address data type issues
    ws_bmt.Range(ws_bmt.Cells(1+1, 1), ws_bmt.Cells(1+len(bmt_data), len(bmt_data.columns))).NumberFormat = "@"
    ws_bmt.Range(ws_bmt.Cells(1+1, 1), ws_bmt.Cells(1+len(bmt_data), len(bmt_data.columns))).Value = bmt_customer_data.values.tolist()
    
    # allow automatic overwriting in excel
    excel.DisplayAlerts = False
    
    # save output
    if int(float(excel.Version)) >= 12:
        wb.SaveAs(customer_dashboard_file, win32.constants.xlOpenXMLWorkbook)
    else:
        wb.SaveAs(customer_dashboard_file)
    excel.DisplayAlerts = True
    
    # close excel template, discarding changes
    wb.Close(SaveChanges=False)
    
    # quit excel application
    excel.Application.Quit()
    

    In case however enough people decide to contribute for the development of the pivot table (pivot chart) support for openpyxl, I would like to contribute money-wise up to a certain budget.

    stefan.zhelev at gmail.com

  15. CharlieC

    @Marc Breton I'm not sure what you mean: if you have a file with a pivot table in it then you should be able to open and work on it without losing the pivot table. If you mean how to create a pivot table from scratch then, no.

  16. Marc Breton

    Hi,

    For now I'm trying to do something very basic. I have a file named "template.xlsx" where I have 2 tabs named "pivot" and "raw". I'm simply trying to load the template and save it to a different name to make sure the pivot table is still OK. For now my template contains some raw data but eventually it will only contain the header and I will fill it with the rest of my python script.

    wb = load_workbook('template.xlsx')
    wb.save('result.xlsx')
    

    The issue I have is that when I open "result.xlsx". It seems to open correctly but when I click few times within the pivot table, Excel (2106) stops working and does not give me any error message. If I open the template, and click few times as I do for "result.xlsx" there is no issue.

    But I just tried saving the file using the same name (template.xlsx) and there is no issue when it's using the same name. Worst case I will use "shutil" to create a copy that I will then modify with openpyxl.

    I thought I had to do something with pivot module, that's why I asked my question. If you have a better idea or if I'm doing something wrong, please let me know.

    Thanks, Marc

  17. CharlieC

    This question should really be directed to the mailing list, or a separate bug report. It doesn't sound like there is anything wrong with the file that openpyxl has created but without the file there's not a lot I can say.

  18. Sassafras_wot

    Very appreciative of this library. just testing it out to help automate a friend's daily spreadsheet cutting and pasting. problem is pivot tables. i have an xlsx with pivots, i simply open the file, and save it again to a test file and the pivots are gone. :(

    edit: i saved my output_template as an xlsm, then i edited openpyxl\workbook.py::def mime_type(self) , commented it all out and returned XLSM hoping that would allow excel to open the file (which it did_ but the pivots were still gone.

    versions of stuff:

    09/11/2017 23:31:37 INFO:['python', '3.6.1 (v3.6.1:69c0db5, Mar 21 2017, 17:54:52) [MSC v.1900 32 bit (Intel)]']

    09/11/2017 23:31:37 INFO:['xlrd', '1.1.0']

    09/11/2017 23:31:37 INFO:['openpyxl', '2.4.8']

  19. Raul_Xaxis

    Currently I'm using latest openpyxl 2.5 and python 3.6. Still, when I use openpyxl to save a dataframe to an existing worksheet to an existing excel file, I lose the pivot table on the other worksheet. Is there a work around on this?

    import openpyxl

    source = 'source.xlsx' wb = openpyxl.load_workbook(source)
    writer = pd.ExcelWriter(source, engine='openpyxl') writer.book = wb writer.sheets = dict((ws.title, ws) for ws in wb.worksheets) df.to_excel(writer, 'Data Sheet', startrow=4, startcol=0, header=False, index=False) wb.save('Results.xlsx')

    Thanks.

  20. Adam Kulas

    I am having the same issue as Raul.

    I have an existing excel document with two sheets. The first is a sheet called 'pivot' containing only a pivot table, the second is 'raw' containing raw data from a dataframe. When the data is written to the 'raw' sheet the pivot table is converted to plain text

    openpyxl version 2.5.0

    Source:

    from openpyxl import load_workbook
    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame(np.random.randint(low=0, high=10, size=(5, 5)), columns=['a', 'b', 'c', 'd', 'e'])
    
    book = load_workbook('example.xlsx')
    writer = pd.ExcelWriter('example.xlsx', engine='openpyxl')
    writer.book = book
    writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
    df.to_excel(writer,'raw', index=False)
    writer.save()
    

    I have attached the excel document 'example.xlsx'. When the above code is run, the pivot sheet is converted to plain text.

  21. CharlieC

    Pandas knows nothing about Pivot Tables so will never write them and in the supplied examples this means overwriting the original files. Use the utilities provided by openpyxl or lose your Pivot Tables.

  22. Log in to comment