Tabular is a package of Python modules for working with tabular data. Its main object is the tabarray class, a data structure for holding and manipulating tabular data. By putting data into a tabarray object, you'll get a representation of the data that is more flexible and powerful than a native Python representation.
More specifically, tabarray provides:
- ultra-fast filtering, selection, and numerical analysis methods, using convenient Matlab-style matrix operation syntax
- spreadsheet-style operations, including row & column operations, 'sort', 'replace', 'aggregate', 'pivot', and 'join'
- flexible load and save methods for a variety of file formats, including delimited text (CSV), binary, and HTML
- sophisticated inference algorithms for determining formatting parameters and data types of input files
- support for hierarchical groupings of columns, both as data structures and file formats
Future Things To Do:
Reformatting code docs; adding tests:
- follow numpy docstring conventions
- add doctests throughout code
- reformat docs so they fit on the page in latex pdf
- improve web.tabular2html with interface for control over CSS styles worked out
- improve interface for selection of null-value selectors (e.g. Nullvalue and NullValueFormats)
- add TypeInference Keyword to loadSV and tabarray constructor to allow users to input their own favorite type inference
- add html2tabular, an inverse to the tabular2html functionality; the goal is to be able to read in both clean and (somewhat) dirty html tables using some table-parsing technology we've developed
- add a AutoIncrement function: something like a command-line version of Excel's drag-to-increment AutoIncrement interface, in which the user supply the first initial values S0 = [s0,s1,s2 ...,sn] of a sequence, and then based on inferences about the datatype and values, AutoIncrement(S0) increments the sequence out arbitrarily. This would be useful for "starting tables" by hand.
- develop and implement general metadata API, including things for data-set metadata, per-column metadata, per-column-group metadata, per row metadata, etc. Handle this both for live objects as well as various read-out conventions.