Ensure CDT file format consistency

USE CASE: WHAT DO YOU WANT TO DO?

Make sure that we maintain the correct data format and compatibility with other software (as much as possible).

STEPS TO REPRODUCE AN ISSUE (OR TRIGGER A NEW FEATURE)

Following our discussion about CDTImporter and its inability to import CDT files from Treeview 3.0, I ran a few tests.

I created 2 test files (see image below) and clustered them with both Cluster 3.0 and Treeview 3.0 (alpha 3). Test1 has just data, Test2 has data + it specifies EWEIGHT and GWEIGHT.

CURRENT BEHAVIOR

Unlike Treeview 3.0 (alpha3), Cluster 3.0:

needs the 1st cell to be non-empty
adds GWEIGHT and EWEIGHT info in both cases (good)
adds a column "NAME" by duplicating the original row names (bad - it is not necessary).
adds 6 decimal places to the data and the GWEIGHT/EWEIGHT (bad but not critical)

Unlike Cluster 3.0, Treeview 3.0 (alpha 3):

is ok with the 1st cell being empty
does not add GWEIGHT and EWEIGHT unless they were present in the input file (bad)
does not add "NAME" (good)
adds 1 decimal place to the data but leaves GWEIGHT/EWEIGHT intact (bad but not critical)

Both softwares:

add the row dendrogram (GID) before the row labels, but the column dendrogram (AID) after the column labels (bad because inconsistent).

EXPECTED BEHAVIOR

Treeview 3.0 should:

if the 1st cell is empty, add an X to it
add GWEIGHT and EWEIGHT to clustered files, even if they did not have them to begin with.
if possible, do not change the number format (if it was an integer, do not add decimal places).
change GID and AID to "NAME".
keep the row/column order consistent (see attached image).

Note:

Here's original CDT file specification (from http://tldrify.com/kku). The proposed changes are consistent with this definition.

A generalized CDT file is a tab-delimited text file with the following specifications. The leftmost column and topmost row are reserved for headers. The file must contain at least two columns followed by a column with the header GWEIGHT, and at least one row followed by a row with the header EWEIGHT. Any rows and columns before the EWEIGHT and GWEIGHT are treated as annotation, and any after are treated as data.

DEVELOPERS ONLY SECTION

SUGGESTED CHANGE (Pseudocode optional)

e.g. Add a color selection class

FILES AFFECTED (where the changes will be implemented) - developers only

e.g. selectColor.java & settingsPanel.java

LEVEL OF EFFORT - developers only

trivial/minor/medium/major/overhaul (choose one)

COMMENTS

Comments (5)