Worksheet.merge_cells don't care of already merged cells
The Worksheet.merge_cells
store wrong cells ranges when we extend alreadymerged cells.
For Instance, I have the following worksheet:
ws = Worksheet(Workbook()) ws.merge_cells(start_row=1, start_column=1, end_row=4, end_column=4) assert ws._merged_cells == ["A1:D4"]
If I want to extend the merged cells to "A1:E4" (one more column), I do:
ws.merge_cells(start_row=1, start_column=1, end_row=4, end_column=5) print(ws._merged_cells)
As a result, I have:
['A1:D4', 'A1:E4']
Which generates a corrupted file (according to Excel 2010).
I think, the Worksheet.merge_cells
should compare the coordinates of alreadymerged cells with the ones of the parameters.
We define:
m_coord
= coordinates of alreadymerged cells,p_coord
= coordinates given by the parameters
Here is the tests we can do on each coordinates of alreadymerged cells:
 If
m_coord
andp_coord
are disjoint: ignore > pass  If
m_coord
inp_coord
: extend m_coord  else: raise an exception
Comments (22)


reporter Here is a possible implementation:
def merge_cells(self, range_string=None, start_row=None, start_column=None, end_row=None, end_column=None): """ Set merge on a cell range. Range is a cell range (e.g. A1:E1) """ if not range_string and not all((start_row, start_column, end_row, end_column)): msg = "You have to provide a value either for 'coordinate' or for " \ "'start_row', 'start_column', 'end_row' *and* 'end_column'" raise ValueError(msg) elif not range_string: range_string = '{0}{1}:{2}{3}'.format(get_column_letter(start_column), start_row, get_column_letter(end_column), end_row) elif ":" not in range_string: if COORD_RE.match(range_string): return # Single cell, do nothing raise ValueError("Range must be a cell range (e.g. A1:E1)") else: range_string = range_string.replace('$', '') min_col, min_row, max_col, max_row = range_boundaries(range_string) for merge_range in self._merged_cells[:]: merge_min_col, merge_min_row, merge_max_col, merge_max_row = range_boundaries(merge_range) issubset = ((min_row <= merge_min_row <= merge_max_row <= max_row) and (min_col <= merge_min_col <= merge_max_col <= max_col)) if issubset: # extend the existing range self._merged_cells.remove(merge_range) self._merged_cells.append(range_string) break issuperset = ((merge_min_row <= min_row <= max_row <= merge_max_row) and (merge_min_col <= min_col <= max_col <= merge_max_col)) if issuperset: # ignore the range (already extended) break intercept = (((merge_min_row <= min_row <= merge_max_row) or (min_row <= merge_max_row <= max_row)) and ((merge_min_col <= min_col <= merge_max_col) or (min_col <= merge_max_col <= max_col))) if intercept: fmt = "The range {0} intercept the merged cells: {1}" raise ValueError(fmt.format(range_string, merge_range)) else: # merge cells self._merged_cells.append(range_string) rows = range(min_row, max_row + 1) cols = range(min_col, max_col + 1) cells = product(rows, cols) # all but the topleft cell are removed for c in islice(cells, 1, None): if c in self._cells: del self._cells[c]

reporter I agree with you:
It's worth noting that the generated file is perfectly valid according to the specification.
But, in production, I need that my customers could open the generated file with Excel.

reporter About your comment:
A proper implementation would involve some kind of geometric structure that could more easily check for overlaps.
I have implemented a
SheetRange
with do all the geometric computation and more. I can add this as an attachment. 
reporter  attached sheet_range.py
Represents a range in a sheet: title and coordinates.
This object is used to perform calculations on ranges, like:
 shifting to a direction,
 union/intersection with another sheet range,
 collapsing to a 1 x 1 range,
 expanding to a given size.
We can check whether a range is:
 equal or not equal to another,
 disjoint of another,
 contained in another.
We can get:
 the string representation,
 the title,
 the coordinates,
 the size of a range.

reporter Note: we have the same kind of problem with
Worksheet.unmerge_cells
: it should unmerge any range which coordinates are included in the coordinates given by parameters. We find this behavior in LibreOffice Calc. 
The proposed solution is far too complex to be practicable. And, again, you compare the behaviour with an a spreadsheet application.

reporter Sorry for my lack of understanding, but which solution do you consider "far too complex to be practicable". Is it my fix of
merge_cells
(which I find clear and efficient) or a new implementation usingSheetRange
(which is more complex but more accurate for many circonstances)? 
The proposed patch to
merge_cells
: too many codepaths.I haven't looked at the
SheetRange
but it might be the way to go for a new implementation of merged cells which could reliably provide the necessary functionality for what you want, and what other people seem to want to be able to do (usually formatting). In any case, in the event of overlap I'd always raise an exception and pass the responsibility to client code to decide what to do. 
I've had time to look at the SheetRange code and it's quite nice. I think it should be refactored to be simpler (there's quite a bit of repetitive code in there and a lot to test). It could possibly be extended to represent a range of merged cells so that formatting can be done with it (fill, edges).
For merged cells themselves I'd still like a dedicated object whose value is either
None
or the value of the topleft cell of the range. In either case it's not settable except, of course, for the topleft cell.We then need a guard when creating cells so they can't be created in the range.
Code like this would go in the next version.

@Tantale I'm looking to add this in 2.5 but I don't understand all the code. What is the purpose of the
min_col_p
, etc? Is this just to preserve relative/absolute coordinates? 
reporter The input of
merge_cells()
is a string (therange_string
parameter). I convert this string to integer coordinates in order to have the coordinates of a rectangle. Then, I search all merged cells. For each merged cell, I also calculate the coordinates of its rectangle. With that in hand, I am able to see if the two rectangles intersect each other or if the first is a subset or a superset of the other. It's only 2D math.I prefer using integer coordinates than letter + number. It is more efficient to calculate rectangles intersections…

The conversion from Excel to numerical coordinates already happens, but it's the extra info you keep which I was asking about which have no function in the class except to preserve
$
in the range string for roundtripping. This makes the standard class constructor more verbose and is not required forMergedCells
I think a more generic
CellRange
object, with the methods you've written is more useful than just for handlingMergedCells
. Actually, it allows us to avoid the problems associated with it almost totally.I've made an initial commit to the 2.5 branch.

Issue
#793was marked as a duplicate of this issue. 
Issue
#799was marked as a duplicate of this issue. 
Issue
#854was marked as a duplicate of this issue. 
 edited description
 changed status to resolved
Resolved in 2.5b1

Hi, @charlie_x
I check version 2.5b1, and styles are still broken.
For example, I just load workbook and save.
wb = load_workbook('original.xlsx') wb.save('saved.xlsx')

@rovkinmax nothing has changed for merged cells but we do now have a cell range object. At some point in the future we can have a special MergedCellRange and MergedCells for handling the formatting.

@charlie_x but why simple saving breaks styles?

Nothing breaks but ambiguity is removed. See related issues for discussions about this.

 removed version
Removing version: 2.4.x (automated comment)
 Log in to comment
I'm not sure if we can really do this without simply raising an exception. Extending a merge might not be what the user wants so raising an exception would allow them to decide form themselves However, I don't like the idea of the potentially nested loops that would be require for any checks. A proper implementation would involve some kind of geometric structure that could more easily check for overlaps.
It's worth noting that the generated file is perfectly valid according to the specification.