Can't determine dimensions when <dimension> element and <row spans=...> attribute are both missing

Issue #269 resolved
Anonymous created an issue

If neither <dimension ref="A1:M324"/> nor <row r="1" spans="1:13"> is available in the sheet.xml file the dimensions cannot be determined.

A patch of the read_dimension() function in reader/worksheet.py is attached (based on 1.8.3)

$ diff worksheet.py-1.8.3 worksheet.py 36c36 < from openpyxl.cell import Cell, coordinate_from_string


from openpyxl.cell import Cell, coordinate_from_string, column_index_from_string, coordinate_from_string 78c78 < if ":" in span:


        if span and ":" in span:

85a86,94

    if el.tag == '{%s}c' % SHEET_MAIN_NS:
        col = column_index_from_string(coordinate_from_string(el.get("r"))[0])
        if min_col is None:
            min_col = max_col = col
        else:
            min_col = min(min_col, col)
            max_col = max(max_col, col)

87c96 < warn("Unsized worksheet")


#warn("Unsized worksheet")

Comments (10)

  1. Max Böhm

    Please see the attached sample input file "Request.xlsx" of Issue #270. This is also a sample for this issue. The file has been generated by a tool. Its first worksheet (xl/worksheets/sheet.xml) contains:

    <x:worksheet xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
        <x:sheetData>
            <x:row r="1">
                <x:c r="A1" s="1" t="str">
                    <x:v>Request Aries Request #</x:v>
                </x:c>
                <x:c r="B1" s="1" t="str">
                    <x:v>Request Aries Requester/Contact</x:v>
                </x:c>
    

    while in Excel it is:

    <worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" mc:Ignorable="x14ac" xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac">
        <dimension ref="A1:N2"/>
        <sheetViews>
            <sheetView tabSelected="1" workbookViewId="0"/>
        </sheetViews>
        <sheetFormatPr defaultRowHeight="15" x14ac:dyDescent="0.25"/>
        <sheetData>
            <row r="1" spans="1:14" x14ac:dyDescent="0.25">
                <c r="A1" s="1" t="s">
                    <v>0</v>
                </c>
                <c r="B1" s="1" t="s">
                    <v>1</v>
                </c>
    

    Note that the first example neither has a <dimensions> element nor a spans attribute in the rows. This causes the read_dimension() function openpyxl to fail.

  2. Max Böhm

    In the initial description of Issue #269 the formatting of the second part with the suggested patch got messed up by accident. Therefore I have put it here again...

    $ diff worksheet.py-1.8.3 worksheet.py
    36c36
    < from openpyxl.cell import Cell, coordinate_from_string
    ---
    > from openpyxl.cell import Cell, coordinate_from_string, column_index_from_string, coordinate_from_string
    78c78
    <             if ":" in span:
    ---
    >             if span and ":" in span:
    85a86,94
    >
    >         if el.tag == '{%s}c' % SHEET_MAIN_NS:
    >             col = column_index_from_string(coordinate_from_string(el.get("r"))[0])
    >             if min_col is None:
    >                 min_col = max_col = col
    >             else:
    >                 min_col = min(min_col, col)
    >                 max_col = max(max_col, col)
    >
    87c96
    <     warn("Unsized worksheet")
    ---
    >     #warn("Unsized worksheet")
    
  3. Charlie Clark

    FWIW we will certainly not go with this kind of implementation because it would mean parsing the whole of the worksheet twice. We are currently working towards using a common backend for standard and "iterable" worksheets. The warning for unsized worksheets will stay and you will only get their dimensions if you explicitly loop through all the cells. It's bad enough going through every row. If we don't do this we'll punish every use on unsized worksheets.

  4. Log in to comment