1. openpyxl
  2. openpyxl
  3. openpyxl
  4. Issues

Issues

Issue #179 resolved

Creating a Chart on a separate tab prevents all worksheets from loading.

Albert Kottke
created an issue

If you run the attached python script, the following is printed: ['Sheet1', 'Chart1', 'Sheet2']

However, the actual number of sheets in the workbook is: ['Sheet1', 'Chart1', 'Sheet2', 'Sheet3']

This happens because of:

sheet_codename = 'sheet%d.xml' % (i + 1)

in reader/excel.py doesn't consider that some codenames are actually considered chartsheets (e.g., 'xl/chartsheets/sheet1.xml' versus 'xl/worksheets/sheet1.xml')

Comments (4)

  1. Albert Kottke reporter

    I have found a solution that uses the information in xl/_rels/workbook.xml.rels. I am not totally satisfied with the changes.

    # from workbook.py
    def read_sheets_titles(xml_source):
        """Read titles for all sheets."""
        root = fromstring(xml_source)
        titles_root = root.find(QName('http://schemas.openxmlformats.org/spreadsheetml/2006/main',
                'sheets').text)
    
        link_key = QName('http://schemas.openxmlformats.org/officeDocument/2006/relationships', 'id').text
    
        return [(sheet.get(link_key), sheet.get('name'))
                for sheet in titles_root.getchildren()]
    
    # from excel.py
    def _load_workbook(wb, archive, filename, use_iterators):
    
        valid_files = archive.namelist()
    
        # get workbook-level information
        try:
            wb.properties = read_properties_core(archive.read(ARC_CORE))
        except KeyError:
            wb.properties = DocumentProperties()
    
        try:
            string_table = read_string_table(archive.read(ARC_SHARED_STRINGS))
        except KeyError:
            string_table = {}
        wb.loaded_theme = archive.read(ARC_THEME)
        style_table = read_style_table(archive.read(ARC_STYLE))
    
        wb.properties.excel_base_date = read_excel_base_date(xml_source=archive.read(ARC_WORKBOOK))
    
    
        # Read the relationships
        root = fromstring(archive.read('xl/_rels/workbook.xml.rels'))
        relationships = {c.get('Id'): c.get('Target') for c in root.getchildren()}
    
        # get worksheets
        wb.worksheets = []  # remove preset worksheet
        sheet_names = read_sheets_titles(archive.read(ARC_WORKBOOK))
    
        i = 0
        for sheet_link, sheet_name in sheet_names:
            worksheet_path = '%s/%s' % ('xl', relationships[sheet_link])
            if 'chartsheet' in worksheet_path:
                continue
    
            if not worksheet_path in valid_files:
                continue
    
            if not use_iterators:
                new_ws = read_worksheet(archive.read(worksheet_path), wb, sheet_name, string_table, style_table)
            else:
                xml_source = unpack_worksheet(archive, worksheet_path)
                new_ws = read_worksheet(xml_source, wb, sheet_name, string_table, style_table, filename, sheet_codename)
            wb.add_sheet(new_ws, index=i)
            i += 1
    
        wb._named_ranges = read_named_ranges(archive.read(ARC_WORKBOOK), wb)
    
  2. Mark Mikofski

    I think the issue actually comes from the openpyxl/reader/workbook.py function read_sheet_titles because it uses the key <sheets> in xl/workbook.xml which has zero info about whether the sheet is a sheet or a chart. I created a patch, which uses [Content_Types].xml that tells everything about the spreadsheet.The details of the patch are summarized in pull request #41

  3. Eric Gazoni

    fixes issue #165, #179 and #209

    • add function read_content_types to openpyxl/reader/workbook.py that reads all of the 'Override' items in [Content_Types].xml
    • import ARC_CONTENT_TYPES and read_content_types into openpyxl/reader/excel.py
    • create constants VALID_CHARTSHEET and VALID_WORKSHEET that have the ContentType tag specified in [Content_Types].xml for chartsheets and worksheets
    • call read_content_types immediately before read_sheet_titles
    • before iterating over sheet_names create new list called worksheet_names that contains only the list of sheets that are actually worksheets, by comparing their type to VALID_WORKSHEET
    • import basic debug logging, use standard message format (from Doug Hellmann) "http://doughellmann.com/2007/05/pymotw-logging.html"
    • add show_debug_log optional arg to load_workbook & pass it on to Lload_workbook
    • add Sphinx autodoc text for show_debug_log
    • if show_debug_log, then output name of each worksheet before it's loaded

    → <<cset 498594f25594>>

  4. Log in to comment