Clone wiki

tabular / Home


Tabular is a package of Python modules for working with tabular data. Its main object is the tabarray class, a data structure for holding and manipulating tabular data. By putting data into a tabarray object, you'll get a representation of the data that is more flexible and powerful than a native Python representation.

More specifically, tabarray provides:

  • ultra-fast filtering, selection, and numerical analysis methods, using convenient Matlab-style matrix operation syntax
  • spreadsheet-style operations, including row & column operations, 'sort', 'replace', 'aggregate', 'pivot', and 'join'
  • flexible load and save methods for a variety of file formats, including delimited text (CSV), binary, and HTML
  • sophisticated inference algorithms for determining formatting parameters and data types of input files
  • support for hierarchical groupings of columns, both as data structures and file formats

Future Things To Do:

Reformatting code docs; adding tests:

  • follow numpy docstring conventions
  • add doctests throughout code
  • reformat docs so they fit on the page in latex pdf


  • improve web.tabular2html with interface for control over CSS styles worked out
  • improve interface for selection of null-value selectors (e.g. Nullvalue and NullValueFormats)
  • add TypeInference Keyword to loadSV and tabarray constructor to allow users to input their own favorite type inference


  • add html2tabular, an inverse to the tabular2html functionality; the goal is to be able to read in both clean and (somewhat) dirty html tables using some table-parsing technology we've developed
  • add a AutoIncrement function: something like a command-line version of Excel's drag-to-increment AutoIncrement interface, in which the user supply the first initial values S0 = [s0,s1,s2 ...,sn] of a sequence, and then based on inferences about the datatype and values, AutoIncrement(S0) increments the sequence out arbitrarily. This would be useful for "starting tables" by hand.
  • develop and implement general metadata API, including things for data-set metadata, per-column metadata, per-column-group metadata, per row metadata, etc. Handle this both for live objects as well as various read-out conventions.