Adding or deleting rows or columns does not update cell ranges

Issue #1273 new
David Wheeler created an issue

When using any of the following worksheet methods, the coordinates for cell ranges that are impacted by the change do not get updated:

  • insert_columns()
  • insert_rows()
  • delete_columns()
  • delete_rows()
  • _move_cells()

So far I can confirm the following are not updated:

  • named ranges in wb.defined_names
  • merged cell ranges in ws.merged_ cells
  • Print_Area (ws._print_area)

I would suspect the following to also be affected, although less likely to cause an issue:

  • ws._print_rows
  • ws._print_cols

In addition, ColumnDimensions and RowDimensions are not following the columns and rows as the cells are moved.

There may be other things that would be affected that I have not identified yet. I have a prototype of a fix for the merged cell ranges but given the wider scope than that, more than one kind of fix will presumably be needed and it may be desirable to have an architectural fix that would work for any cell range including ones that have not been implemented yet.

Comments (7)

  1. CharlieC

    As you note the list could be pretty long and would effectively introduce application-like functionality to a library which is focussed primarily on the file format. There are currently no plans to do this. The best thing is probably to improve the documentation so that client code can manage the necessary changes.

  2. CharlieC

    When using any of the following worksheet methods, the coordinates for cell ranges that are impacted by the change do not get updated:

    • insert_columns()
    • insert_rows()
    • delete_columns()
    • delete_rows()
    • _move_cells()

    So far I can confirm the following are not updated:

    • named ranges in wb.defined_names
    • merged cell ranges in ws.merged_ cells
    • Print_Area (ws._print_area)

    I would suspect the following to also be affected, although less likely to cause an issue:

    • ws._print_rows
    • ws._print_cols

    In addition, ColumnDimensions and RowDimensions are not following the columns and rows as the cells are moved.

    There may be other things that would be affected that I have not identified yet. I have a prototype of a fix for the merged cell ranges but given the wider scope than that, more than one kind of fix will presumably be needed and it may be desirable to have an architectural fix that would work for any cell range including ones that have not been implemented yet.

  3. David Wheeler reporter

    Since finding this last week I have been looking at what it would take in the way of coding to deal with it and it is quite tedious - and therefore imho unreasonable to simply document the errant behaviour and ask users to deal with it on their own. This is not something that (for example) Excel can tidy up automatically when a workbook is opened in Excel. Unlike our philosophical discussion around styles associated with merged cell ranges, the methods mentioned above do real harm to the relationships between the data elements and the mis-alignments get written into the saved file.

    This …

    … becomes this …

    … if two rows and two columns are added. You can see that all of the following are offset from where they should now be after the insertions:

    • named range (have to look in the top left hand corner where Excel displays the name of the highlighted range)

    • page break

    • column dimensions

    • row dimensions

    • merged cell range

    • print range

    In fact the only things that seem to have moved are the cell values and styles.

    If the library is inserting (or deleting) the rows and columns, my view would be that the library should attend to keeping any other aspects of the worksheet aligned with the changes. We could debate about whether the print range should be adjusted (I think it should) but the user shouldn't have to worry about the others.

    As I said before I'm willing to have a shot at fixing it, within the constraints of the existing API. I have an architecture in mind which I'd be happy to put forward for review.

  4. David Wheeler reporter

    You can see the work in progress at https://bitbucket.org/WheeleD/openpyxl/branch/cell_anchors

    The basic architectural change is now done and all tests are passing. From here it is easier to make cells and cell ranges track changes on the row and column axes. The Row and Column objects can be extended to provide homes for things like column dimensions and row dimensions.

    I will pause for breath at this point. All of the new objects are in the one module “worksheet.cell_anchors”. If you think there is merit in the approach, then I can set up a pull request and look into the extensions that will fix the issues and the associated documentation.

    This is a more elegant and general solution (imho) than the PR (336) I did last week which was a patch that only dealt with merged cell ranges within the limitations of the existing row and column architecture.

  5. CharlieC

    But you’re introducing exactly the kind of application-like functionality that we don’t intend to provide. The real problem with this is, that users might reasonably expect it to track and manage all relevant changes in a worksheet, whereas the current approach clearly makes this the user’s responsbility.

    I've only skimmed the code but I think the changes to Cell would not be acceptable due to future changes that are plannned – Cell objects will be removed in a future release. And the tests are too ambitious.

    I think that you can probably write something using CellRange to check which areas are likely to be changed and use that as a basis.

  6. David Wheeler reporter

    I’m not sure where you’re drawing the line, then, between application-like functionality and the tools to be provided in a library. Seems to me adding rows and columns could also be considered application-like functionality - and it was the introduction of those methods that gave rise to the problems noted above.

    What do you mean by the tests being too ambitious? They were intended to (a) provide full coverage of the new code, and (b) illustrate how the new code behaves. Those objectives are based on my reading of the rules for submissions…

  7. CharlieC

    I basically draw the line at anything that has a dependency graph and involves multiple objects or evaluations. To do this properly requires a lot of code, and not to do it properly will inevitably frustrate users. Instead of extending openpyxl to do this, I think it’s better to look at how to handle individual cases such as merged cells, defined names and to provide the tools for client code to manage them as the user seems fit.

    The tests are ambitious in that they contain multiple asserts per test. This should only be the case in exceptions, such as when you want to verify a setup. The problem is that tests stop after the first failure so multiple asserts can mask multiple failures.

  8. Log in to comment