dataliberation /

Filename Size Date modified Message
3.0 KB

Tools for liberating project's data from different sites.


issues/ - export data from various issue trackers


[ ] Get issue export working
. this is needed for SCons project . and probably for Subversion

[ ] Make sure converted data for is complete [ ] Define target format for conversions [ ] Define common intermediate format for issues [ ] Write exporter from intermediate format into target

[ ] Figure out manual mapping requirements, like
usernames, issue numbers etc.
[ ] Describe conversion process, because it is probably
can not be 100% automated (it is at least needed to post redirection links for older issues)


Initial application design and development.

  1. Tools focus on a structured data
  2. The first level of the structured data format is a tree
  3. Leaves and nodes are just nodes
  4. For simplicity node values are strings only
  5. Nodes can have type hints to prevent data loss about specific type in source format

Implementation details:

+---+ +---+ +---+ | S | -> | F | -> ... -> | T | +---+ +---+ +---+

[S]ource converts its input format into internal tree representation, which then can be passed through several [F]ilters that modify the tree. The resulting tree is then passed to [T]arget format converter.

Final Goal

Don't read anything below unless you want to be confused.

--anatoly techtonik

The major part is to build "structure converter tool" to convert XML or other formats into tree. The tree can be dumped, validated, compared to other tree, or converted. It is also very important to get full information about conversion - if conversion is full, is it reversible or is there a data loss on the way, what kind of data is missing to do the conversion ot make conversion reversible?

There is need in a tool that allows easy analysis/debug of convertation process, to walk step by step and see the outcome of every operation before it occurs. This imposes a certain requirements for good visualization on user interface, and it's should be cross-platform. But underlying scripts should be independent from UI.

It may worth to start such visual interface in PySide. Start with display of initial source data file, then add line numbers, then convert it to a tree keeping line numbers linked to tree elements. Then a window can be split vertically to show scaled tree and file contents simultaneously. Then try to highlight lines and corresponding elements in the tree on selection. Do the same with mouseover - scrolling main window while selecting tree items. After that the UI can be teached to compare trees highlighting modified parts. And finally it should visualize conversion process and walk through it step by step.

If everything above sounds too complicated, we may end up with using Google Refine. =)