1. openpyxl
  2. openpyxl
  3. openpyxl
  4. Issues

Issues

Issue #171 resolved

copy worksheet function

Jared Thompson
created an issue

I thought it might be useful to have a copy worksheet function. This functionality would be similar to the functionality in excel where you can right-click on a worksheet and select copy, then select the worksheet you want to copy.

My current workaround is to create a workbook that is pre-populated with lots of worksheets (of the same type (as in this usecase scenario I am just populating similar sheets with different sales rep data) then I just remove the ones I don't need/are left over.

I am using this workaround as my attempt to copy the sheets in using deepcopy and openpyxl resulting in all the memory being consumed on my computer.

Here is the code I used(made generic to illustrate the memory consumption):

This is the original code I submitted to the mailing list:

import openpyxl
import shutil
import copy

distinct_stores = ['willy']



for each in range(len(distinct_stores)):
    workbook_iter_name = str(distinct_stores[each])+'.xlsx'
    shutil.copyfile('basic_trial.xlsx',workbook_iter_name)
    primary = openpyxl.load_workbook(workbook_iter_name)
    reps_per_store = ['one','two','three','four','five','six','seven','eight','nine','ten','eleven','twelve','thirteen']
    for eacho in range(len(reps_per_store)): 
        ws = primary.get_sheet_by_name('rep')
        primary.add_sheet(copy.deepcopy(ws),eacho+1)
        wss=primary.worksheets[eacho+1]
        wss.title=str(reps_per_store[eacho])
    primary.save(workbook_iter_name)

That would just consume all the ram.

Charlie Clark on the mailing list suggested to use enumerate and some other cleanups. After having cleaned up the code per his excellent suggestions I have this:

import openpyxl
import shutil
import copy

distinct_stores = ['willy']



for idx,store in enumerate(distinct_stores):
    workbook_iter_name = store+'.xlsx'
    #using a blank, single worksheet xlsx file for illustration purposes
    shutil.copyfile('blank.xlsx',workbook_iter_name)
    primary = openpyxl.load_workbook(workbook_iter_name)
    reps_per_store = ['one','two','three','four','five','six','seven','eight','nine','ten','eleven','twelve','thirteen']
    for ido,reps in enumerate(reps_per_store): 
        ws = primary.get_sheet_by_name('rep')
        primary.add_sheet(copy.deepcopy(ws),ido+1)
        wss=primary.worksheets[ido+1]
        wss.title = reps
    primary.save(workbook_iter_name)

This block of code no longer consumes all the ram, but it is very cpu intensive.

As a comparison when i pre-populate the workbook with 55 tabs (generally use 30-45), then I can generate 5 workbooks each with 30-45 tabs while populating them with their own rep data - in under 15 seconds.

So my workaround works but I thought it might be nice to have that copy ability in openpyxl instead of having to pre-populate the workbooks.

Thanks for the awesome openpyxl software, I hope this idea may be useful.

Comments (23)

  1. Charlie Clark

    At the moment I think deepcopy is the only way to go because of the number of properties associated with worksheets - charts, hyperlinks, etc. It might be possible to improve the interface for doing this and also control - explicitly decide what should be copied. The cells have to be copied which can take a while if there are a lot of them.

  2. Eric Amorde

    I know this is an old issue, but wanted to voice my interest in this feature. Common use case is to use the first worksheet as a template to generate several worksheets.

    I took the same approach as Jared and duplicated the worksheet many times, but that makes it difficult to make changes to your "template."

    It might be easier if you knew you wanted to do this when you call load_workbook, because you could theoretically re-read the same XML source into a different worksheet

  3. Charlie Clark

    Eric Amorde can you describe the use case in greater detail?

    I'm still not sure of the best way to approach this other than perhaps to wrap the solution up into a function. In Excel Workbooks are the reserve of all kinds of global variables like styles so there is a lot of nesting. This presumably causes a lot of work when copying the same sheet over and over again which might be avoidable within a workbook. Copying between workbooks would require some kind of handling of possibly conflicting resources such as style ids so that's probably out of scope. wb.copy(ws, title) might be possible. Copying could be delegated to suitable (private?) methods in Worksheets and Cells, Charts, Images, etc. with relevant safeguards and exceptions. An initial implementation might want to limit what can be copied.

    I don't see how anything could work within load_workbook because of the way relevant data is spread around the .xlsx archive.

  4. Eric Amorde

    In my case, I was taking a spreadsheet with employee schedules and generating a report for each employee, on their own sheet. So I needed to copy the first sheet for each person.

    I just tested this out and it worked, but it requires passing info into load_workbook

    # in _load_workbook of reader.excel
    worksheets = list(detect_worksheets(archive))
    for i in range(copy_first):
        ws_copy = deepcopy(worksheets[0])
        ws_copy['title'] += '_' + str(i)
        worksheets.append(ws_copy)
    for sheet in worksheets:
        sheet_name = sheet['title']
        worksheet_path = sheet['path']
        .... # rest of code in _load_workbook
    

    I don't think this would work for sheet with complicated relationships, but for copying the styling and cell values it works

  5. Charlie Clark

    The challenge with providing such a function is to manage expectations. The code you're currently looks manageable and from what Jared said performance is acceptable. There's a temptation to let sleeping dogs lie and make that the recommended approach. But for reporting purposes templates are understandable.

    Providing official methods as I suggested above might make sense if performance becomes an issue and or there are issues with sheets with images, charts, ranges, etc.

    Might be worth benchmarking what Excel does in a similar situation, especially with a large sheet with other objects in it. Looks like a good chance for a pull request! :-D

  6. Keats .

    Hi, the deepcopy technique doesn't work. Maybe because of python 3 ?

    File "/repos/elypsys/elypsys/app/admin/user_commission.py", line 14, in export_commissions
        ws_copy = deepcopy(wb.get_sheet_by_name(first_sheet))
      File "/repos/_ENVS/elypsys/lib/python3.4/copy.py", line 182, in deepcopy
        y = _reconstruct(x, rv, 1, memo)
      File "/repos/_ENVS/elypsys/lib/python3.4/copy.py", line 300, in _reconstruct
        state = deepcopy(state, memo)
      File "/repos/_ENVS/elypsys/lib/python3.4/copy.py", line 155, in deepcopy
        y = copier(x, memo)
      File "/repos/_ENVS/elypsys/lib/python3.4/copy.py", line 246, in _deepcopy_dict
        y[deepcopy(key, memo)] = deepcopy(value, memo)
      File "/repos/_ENVS/elypsys/lib/python3.4/copy.py", line 182, in deepcopy
        y = _reconstruct(x, rv, 1, memo)
      File "/repos/_ENVS/elypsys/lib/python3.4/copy.py", line 295, in _reconstruct
        y = callable(*args)
    TypeError: __init__() missing 2 required positional arguments: 'worksheet' and 'direction'
    

    Is there another way to use the first sheet as template ?

  7. Charlie Clark

    Just some more thinking about this: we might be able to do something about this directly with the archive. Styles could be a problem as the worksheet file contains hard-coded pointers to a workbook's style definition. The same goes for any other references such as comments or charts. This wouldn't be much of a problem for copying individual worksheets into new workbooks.

    In addition Chris Withers suggested we take a look at the filter functions in xlutils. While the file format is different, XLSX files have inherited much from the older binary format.

    The packaging machinery is coming together that would help manage this if anybody is interested

  8. Donald Freeman

    Charlie Clark this seems like an interesting feature. I will give this a try and submit a pull request using the pattern you suggested. My thoughts are create a utility function to shallow copy attributes on an object, and implant a copy function on each object that calls the utility function. Where nested objects occur then the copy function would call the utility function as well as call the copy function of the nested objects,. This would allow a lot of control over What is copied. What are your thoughts?

  9. Charlie Clark

    I'd have to see some code but it doesn't sound right. It's important to write something that separates the API from the internals because these will be changing.

    For small ranges it's probably fine to copy cells and their styles one-by-one but this would be dog slow for larger workbooks. This could be sped up by making sure all the relevant styles from the source workbook exist in the target workbook, creating a mapping of the pointers where relevant (ie. what wb1._fonts[4] refers to in wb2._fonts. This would allow cells to be copied with much fewer lookups.

  10. Charlie Clark

    Don't be so terse with the variable names.

    copy_worksheet() will need to append(cell.value for cell in row) And separately copy over the StyleArray.

    I don't think copy_util and copy_obj are really necessary: apart from cells pretty much everything in a worksheet can be copied using copy. You can probably test using an empty worksheet.

  11. Donald Freeman

    After appending the rows and copying the styles, Is your suggestion to use deep copy from the worksheet level or to delegate a shallow copy to each nested child object?

  12. Charlie Clark

    You can't use deepcopy on the worksheet because of the cells. But you should be able to use copy / deepcopy on the main attributes. Images and charts will be a problem, I think but the rest should be okay.

  13. Donald Freeman

    Charlie Clark quick question about this. I was working through it and noticed the worksheet images, charts, and drawings are private. Does it make sense to copy these and if so what would your suggestion be about an approach to do so. Currently I am implementing most of the logic on the workbook object, so my only option from there would be to access the _images, etc or to create a public method to access them. Also, does it make sense to copy these sense it appears that openpyxl doesn't read these from an existing file, so we would only be copying the images, charts, etc that were in memory??

    I may be a little off base here but wanted to get some feedback. All the other functionality is moving along, copying cells, comments, styles, etc. But I got a little hung up on the private images, drawings and so forth since there wasn't a public method to access them.

  14. Charlie Clark

    Don't bother about images and charts: they currently only exist in memory and, like cells only more so, are tightly coupled to the worksheet. It's likely that the implementation will change if/as read support is added.

    You need to be clear on the fact that openpyxl is primarily a library that works with the file format and not in any way an Excel-like application. This means that some things that people expect in the GUI will never be available, or only in a restricted manner.

  15. Log in to comment