Getting excel table by name

Issue #844 new
asommerh
created an issue

Currently the tables can be obtained by an integer index, like:

wb['sheetOne']._tables[3]

It would be nice to allow to get it by name, something like:

wb['sheetOne']._tables['Project List']

Comments (21)

  1. asommerh reporter

    I use the table concept like others use defined_names. The excel table is much more robust than a defined_names, as the users usually forget to expend the range of defined_names to the new rows they add. To use one sheet for each tabular data is a better workaround, but that will make impossible to use some optimization software that rely on having all the model data in just one sheet.

  2. asommerh reporter

    (... and not to mention that to have one sheet for each tabular data, it will be a pain to administrate if you have a lot of small tables!)

  3. Sean Kane

    I second this issue - it would be nice to access the tables by either their table.name or table.displayName. For example:

    wb['sheet']._table['table_name']
    

    I usually know what I have called a table (I set the name in the worksheet when I create it) but the index in the ._table collection is another matter. Currently I have to iterate through all items in the ._table collection until I find the table which matches the name I'm looking for.

    This may require a better way to access the tables rather than a protected member (._tables) - I didn't even know ._tables existed until I found this issue. I'm too new to Python and openpyxl to know more about how best to achieve this at the moment.

  4. CharlieC

    There are a group of related pull requests for tables that just need consolidating.

    When it comes to working with tables: the support was added initially to allow tables in existing files to be preserved when they were edited by openpyxl. Because I don't work with tables myself I'm open to suggestions, preferably, via pull requests which would expose tables by name.

  5. Joel Dedrick

    I third this issue. I also used named ranges but changed to tables when I found out named ranges had been deprecated.

    The use case here is that you want to identify a collection of cells in a way that is independent of their location in the workbook and their size so that everything doesn't break when a row is added e.g. I've always used named ranges for this, but I realize their flexibility (can be applied to disjoint regions, etc.) makes them hard to parse. Tables would be fine, but again they need to be referenceable by their name, not their ID, because in general the Excel user doesn't know or care what the ID is. Also (I'm pretty sure that) if a table/range is deleted and recreated with the same name, the ID changes, so that interface is brittle.

    PS: I regret that I'm too much of a Python noob to be of any use in helping implement this change. Thanks to Charlie and whoever else contributed effort to creating openpyxl.

  6. Sean Kane

    What would an "openpyxl" API look like for accessing the tables by name (I'd assume it'd be good to keep it similar to say accessing named ranges)?

    If it was something like get_table(table_name_here) it might be possible to code a quick & dirty implementation (just iterate over the _tables[i].name or _tables[i].displayName property until a table matches). If it used the indexing operators (something like worksheet.tables[table_name_here]) it would be cleaner though.

  7. CharlieC

    I think lookup should probably by displayName and/or the cell range covered by the table, which is likely to be unique for any particular worksheet and would be useful when tables are called table1, table2, etc.

    No PR has yet been submitted for this.

    @Joel Dedrick named ranges haven't been deprecated by the API has been changed to be closer to the implementation.

  8. Brandon Barney

    Is there a way within the API that openpyxl is reading the tables that may be pre-existing? I am looking at the source code for the _tables attribute and it appears just to be a list that only gets its members from add_table. I don't know enough about what Excel is exposing to Python to know how to pre-populate this, but I am happy to dig for it.

    For what it is worth, in the .NET languages tables are exposed through the ListObjects collection, each containing a ListObject (table).

  9. CharlieC

    In Excel, or at least OOXML, everything is handled via relationships so the names are not really relevant, they just need to be unique within a global list. In the GUI it's not really an issue as the user selects by clicking. Objects in DrawingML do have titles which are sometimes visible to the left of the formula bar but I'm not sure how useful this is programmatically: ws.tables['Table1'].

  10. CharlieC

    Clarification would be best with a use case. Most tables have names automatically assigned to them so ranges might make more sense. But I don't know as I've never worked with them.

    I'd accept a PR based on the 2.6 branch with sample client code.

  11. Brandon Barney

    I'll try to get a client code example written up this afternoon. Your reasoning makes sense, but I think the users who will want to access their tables by name are creating them with names which goes against the automatic naming assumption. There are definitely going to be two kinds of users, those who define these properties of their tables, and those who just refer to them by indices, so having both options would be ideal.

    As noted, i'll get a sample in when I can.

  12. Sean Kane

    I'll weigh in with the use case I had back when I first commented on the issue:

    I had a defined template workbook, and on a specific sheet I had 2-3 tables that were required to be present. However users could potentially create their own tables if they wanted for other purposes. The users would fill out the workbook and then I would process it in python.

    Given that the users could create their own tables etc. and I couldn't guarantee anything other than that the tables I needed would be there, I thought it would be easier to give the ones I needed to process defined names and then my python script could just call ws.tables['my_table_name'] to get a table object.

    In the end I had to settle for iterating across all the tables and then finding & comparing their names to determine if they matched what I expected.

  13. Sean Kane

    I understand that iterating over the table names is the only way to do this. What I was trying to get at as an external user of the library is that I would have thought that it would have just been possible for me to have called ws.tables['my_table_name'] and gone straight to my table, with openpyxl handling the logic behind the [].

    What I actually had to do instead was first find out that there was a worksheet property ._tables - as a protected member it didn't jump straight out at me and I think at the time it wasn't even documented in the openpyxl docs (I could be wrong - it was 12 months ago). I then had to nut around with the table properties to find out which one to use (I think there was a difference between name and displayName) and then write my own search loop.

    Simply calling ws.tables['my_table_name'] (or something like ws.get_table_by_name('my_table_name')) seems to provide a nicer API.

    However, I understand that this is an open source project - simply having openpyxl in whatever form is greatly appreciated. My code is working fine for me at the moment so I was just trying to clarify how this API change could be used in the real world.

  14. Stephen Rauch

    In my use case I am parsing formulas so I have the table name in hand from the formula, but I do not have the sheet. There is a wrapper class that manages two instances of an openpyxl workbook since the code uses the values and the formulas at the same time. In this class I implemented a very simple bit of code on the wrapper to return the table and the sheet_name given the table_name:

    def table(self, table_name):
        """ Return the table and the sheet it was found on
    
        :param table_name: name of table to retrieve
        :return: table, sheet_name
        """
        # table names are case insensitive
        if self._tables is None:
            TableAndSheet = collections.namedtuple(
                'TableAndSheet', 'table, sheet_name')
            self._tables = {
                t.name.lower(): TableAndSheet(t, ws.title)
                for ws in self.workbook for t in ws._tables
            }
            self._tables[None] = TableAndSheet(None, None)
        return self._tables.get(table_name.lower(), self._tables[None])
    
  15. Log in to comment