Worksheet.merge_cells don't care of already merged cells

Issue #708 resolved
Laurent LAPORTE created an issue

The Worksheet.merge_cells store wrong cells ranges when we extend already-merged cells.

For Instance, I have the following worksheet:

ws = Worksheet(Workbook())
ws.merge_cells(start_row=1, start_column=1, end_row=4, end_column=4)
assert ws._merged_cells == ["A1:D4"]

If I want to extend the merged cells to "A1:E4" (one more column), I do:

ws.merge_cells(start_row=1, start_column=1, end_row=4, end_column=5)
print(ws._merged_cells)

As a result, I have:

['A1:D4', 'A1:E4']

Which generates a corrupted file (according to Excel 2010).

I think, the Worksheet.merge_cells should compare the coordinates of already-merged cells with the ones of the parameters.

We define:

  • m_coord = coordinates of already-merged cells,
  • p_coord= coordinates given by the parameters

Here is the tests we can do on each coordinates of already-merged cells:

  • If m_coord and p_coord are disjoint: ignore -> pass
  • If m_coord in p_coord: extend m_coord
  • else: raise an exception

Comments (22)

  1. CharlieC

    I'm not sure if we can really do this without simply raising an exception. Extending a merge might not be what the user wants so raising an exception would allow them to decide form themselves However, I don't like the idea of the potentially nested loops that would be require for any checks. A proper implementation would involve some kind of geometric structure that could more easily check for overlaps.

    It's worth noting that the generated file is perfectly valid according to the specification.

  2. Laurent LAPORTE reporter

    Here is a possible implementation:

    def merge_cells(self, range_string=None, start_row=None, start_column=None, end_row=None, end_column=None):
        """ Set merge on a cell range.  Range is a cell range (e.g. A1:E1) """
        if not range_string and not all((start_row, start_column, end_row, end_column)):
            msg = "You have to provide a value either for 'coordinate' or for " \
                  "'start_row', 'start_column', 'end_row' *and* 'end_column'"
            raise ValueError(msg)
        elif not range_string:
            range_string = '{0}{1}:{2}{3}'.format(get_column_letter(start_column),
                                                  start_row,
                                                  get_column_letter(end_column),
                                                  end_row)
        elif ":" not in range_string:
            if COORD_RE.match(range_string):
                return  # Single cell, do nothing
            raise ValueError("Range must be a cell range (e.g. A1:E1)")
        else:
            range_string = range_string.replace('$', '')
    
        min_col, min_row, max_col, max_row = range_boundaries(range_string)
        for merge_range in self._merged_cells[:]:
            merge_min_col, merge_min_row, merge_max_col, merge_max_row = range_boundaries(merge_range)
            issubset = ((min_row <= merge_min_row <= merge_max_row <= max_row) and
                        (min_col <= merge_min_col <= merge_max_col <= max_col))
            if issubset:
                # extend the existing range
                self._merged_cells.remove(merge_range)
                self._merged_cells.append(range_string)
                break
            issuperset = ((merge_min_row <= min_row <= max_row <= merge_max_row) and
                          (merge_min_col <= min_col <= max_col <= merge_max_col))
            if issuperset:
                # ignore the range (already extended)
                break
            intercept = (((merge_min_row <= min_row <= merge_max_row) or
                          (min_row <= merge_max_row <= max_row)) and
                         ((merge_min_col <= min_col <= merge_max_col) or
                          (min_col <= merge_max_col <= max_col)))
            if intercept:
                fmt = "The range {0} intercept the merged cells: {1}"
                raise ValueError(fmt.format(range_string, merge_range))
        else:
            # merge cells
            self._merged_cells.append(range_string)
    
        rows = range(min_row, max_row + 1)
        cols = range(min_col, max_col + 1)
        cells = product(rows, cols)
        # all but the top-left cell are removed
        for c in islice(cells, 1, None):
            if c in self._cells:
                del self._cells[c]
    
  3. Laurent LAPORTE reporter

    I agree with you:

    It's worth noting that the generated file is perfectly valid according to the specification.

    But, in production, I need that my customers could open the generated file with Excel.

  4. Laurent LAPORTE reporter

    About your comment:

    A proper implementation would involve some kind of geometric structure that could more easily check for overlaps.

    I have implemented a SheetRange with do all the geometric computation and more. I can add this as an attachment.

  5. Laurent LAPORTE reporter

    Represents a range in a sheet: title and coordinates.

    This object is used to perform calculations on ranges, like:

    • shifting to a direction,
    • union/intersection with another sheet range,
    • collapsing to a 1 x 1 range,
    • expanding to a given size.

    We can check whether a range is:

    • equal or not equal to another,
    • disjoint of another,
    • contained in another.

    We can get:

    • the string representation,
    • the title,
    • the coordinates,
    • the size of a range.
  6. Laurent LAPORTE reporter

    Note: we have the same kind of problem with Worksheet.unmerge_cells: it should unmerge any range which coordinates are included in the coordinates given by parameters. We find this behavior in LibreOffice Calc.

  7. CharlieC

    The proposed solution is far too complex to be practicable. And, again, you compare the behaviour with an a spreadsheet application.

  8. Laurent LAPORTE reporter

    Sorry for my lack of understanding, but which solution do you consider "far too complex to be practicable". Is it my fix of merge_cells (which I find clear and efficient) or a new implementation using SheetRange (which is more complex but more accurate for many circonstances)?

  9. CharlieC

    The proposed patch to merge_cells: too many codepaths.

    I haven't looked at the SheetRange but it might be the way to go for a new implementation of merged cells which could reliably provide the necessary functionality for what you want, and what other people seem to want to be able to do (usually formatting). In any case, in the event of overlap I'd always raise an exception and pass the responsibility to client code to decide what to do.

  10. CharlieC

    I've had time to look at the SheetRange code and it's quite nice. I think it should be refactored to be simpler (there's quite a bit of repetitive code in there and a lot to test). It could possibly be extended to represent a range of merged cells so that formatting can be done with it (fill, edges).

    For merged cells themselves I'd still like a dedicated object whose value is either None or the value of the top-left cell of the range. In either case it's not settable except, of course, for the top-left cell.

    We then need a guard when creating cells so they can't be created in the range.

    Code like this would go in the next version.

  11. CharlieC

    @Laurent LAPORTE I'm looking to add this in 2.5 but I don't understand all the code. What is the purpose of the min_col_p, etc? Is this just to preserve relative/absolute coordinates?

  12. Laurent LAPORTE reporter

    The input of merge_cells() is a string (the range_string parameter). I convert this string to integer coordinates in order to have the coordinates of a rectangle. Then, I search all merged cells. For each merged cell, I also calculate the coordinates of its rectangle. With that in hand, I am able to see if the two rectangles intersect each other or if the first is a subset or a superset of the other. It's only 2D math.

    I prefer using integer coordinates than letter + number. It is more efficient to calculate rectangles intersections…

  13. CharlieC

    The conversion from Excel to numerical coordinates already happens, but it's the extra info you keep which I was asking about which have no function in the class except to preserve $ in the range string for roundtripping. This makes the standard class constructor more verbose and is not required for MergedCells

    I think a more generic CellRange object, with the methods you've written is more useful than just for handling MergedCells. Actually, it allows us to avoid the problems associated with it almost totally.

    I've made an initial commit to the 2.5 branch.

  14. CharlieC

    @Макс Ровкин nothing has changed for merged cells but we do now have a cell range object. At some point in the future we can have a special MergedCellRange and MergedCells for handling the formatting.

  15. Log in to comment