Support row and column attributes for cell when cell is empty in IterableWorksheet

Issue #399 closed
John Sivak
created an issue

I'm looping over columns in a row from an IterableWorksheet.

Cells that are empty do not get their row and column attributes set:

(code from openpyxl/worksheet/iter_worksheet.py):

    def get_squared_range(self, min_col, min_row, max_col, max_row):
        """
        The source worksheet file may have columns or rows missing.
        Missing cells will be created.
        """
        if max_col is not None:
            expected_columns = [get_column_letter(ci) for ci in range(min_col, max_col + 1)]
            empty_row = tuple(EMPTY_CELL for column in expected_columns)
        else:
            expected_columns = []
        row_counter = min_row

        # get cells row by row

        for row, cells in self._get_cells(min_row, min_col, max_row, max_col):
            full_row = []
            if row_counter < row:
                # Rows requested before those in the worksheet
                for gap_row in range(row_counter, row):
                    yield empty_row
                    row_counter = row

            if expected_columns:
                retrieved_columns = dict([(c.column, c) for c in cells])
                for column in expected_columns:
                    if column in retrieved_columns:
                        cell = retrieved_columns[column]
                        full_row.append(cell)
                    else:
                        # create missing cell
                        full_row.append(EMPTY_CELL)
            else:
                full_row = cells

            row_counter = row + 1
            yield tuple(full_row)

Is it possible to update/modify the above code to assign the row and column attributes? (like in this snippet):

            # --- 8< ------
            if expected_columns:
                retrieved_columns = dict([(c.column, c) for c in cells])
                for column in expected_columns:
                    if column in retrieved_columns:
                        cell = retrieved_columns[column]
                        full_row.append(cell)
                    else:
                        # create missing cell
                        full_row.append(ReadOnlyCell(None, row, column, None))
            else:
                full_row = cells

            # --- 8< ------

Comments (9)

  1. CharlieC

    Well, apart from the fact that the implementation in this is changing slightly in 2.2 I'm not in favour of this. It imposes an overhead in the code that is attempting to be as fast as possible and that is easily available to client code using enumerate.

    What is the particular use case?

  2. John Sivak reporter

    I'm treating the excel sheet essentially as a CSV, with the first row containing the column headers and the remaining rows containing the data.

    I loop over the first row and build a mapping of Column Letter to cell.value. Example:

             A     |      B   |   C    |
    Row 1:  first  |  middle  | last   |
    Row 2:  bob    |  james   | smith  | 
    Row 3:  sarah  | <empty>  | connor | 
    

    letter_to_header['A'] = 'first' letter_to_header['B'] = 'middle' letter_to_header['C'] = 'last'

    Now, I can process the rest of the sheet using the "column" names rather than letters, building a dictionary similar to a DictReader from the csv module.

    for cell in row:
       line[letter_to_header[cell.column]] = cell.value
    

    The problem I ran into was when I hit an empty cell and the code above crashed (cell.column was missing). Also, I want to allow the user to be able to "move"/add columns as needed, so I can't assume that the content of column B is always "middle".

  3. CharlieC

    You need the column letter?

    for col_idx, cell from enumerate(row, 1): # or 0 depending on whether you want 1- or 0-indexing
        col_letter = get_column_letter(col_idx) # I don't want this lookup in the library.
        line[col_letter] = cell.value
    

    Moving forward I want col indexes to be numeric anyway (a sparse matrix would be the proper implementation, I think).

    @Eric Gazoni what do you think?

  4. CharlieC

    Glad to know it works for you. A couple more points as to why I don't think this should be in the library:

    • if for any reason your "header" row is shorter than any content rows then the code may fail hard
    • likewise if any of your content rows are shorter than your "header"
    • CSV always has a value, even None for every item in a row, Excel explicitly allows cells to be left out

    See #393 for some background on why this is the case. But I'll leave the ticket open for further discussion.

  5. Eric Gazoni

    @CharlieC, I actually find myself in the same situation as @John Sivak, as I use the row and column properties of the cells a lot, but I agree this is a performance toll, and should be configurable whether you want it or not (and I agree that using enumerate is a good workaround).

  6. Log in to comment