Clustering a file with one axis already clustered does not use the existent dendrogram

Issue #256 resolved
Christopher Keil repo owner created an issue

The data will be clustered normally and as expected. But since clustering creates a new file, and since it searches for tree files with the same name (but different file ending), the already existing tree file cannot be found and won't be loaded.

Old tree file should be copied with, and get the new name so it will be loaded.

Comments (9)

  1. Christopher Keil reporter

    The only ways to tell if an axis is already clustered are the presence of a tree file and/or the presence of GID/ AIDs in the matrix file. I will implement a check for this (maybe TVModel.hasAID/ .hasGID will work reliably) but when finding a possible clustered axis a modal JDialog should still pop up that informs the user and asks them how to proceed.

  2. Robert Leach

    So is the intent of this bitbucket issue to skip clustering (for an axis) when a file that is clustered (for that axis) already exists? Or are you saying that clustering will be faster if it starts from an already clustered file? I'm just not sure I understand the purpose of clustering from the original data versus a file that's already been clustered (albeit on 1 axis).

  3. Robert Leach

    @TreeView3Dev , when is it that an old tree file will be used? Is it when a previously generated tree ofr rows exists and the user selects to cluster only the column data or vice versa? Or is this issue intending to skip clustering that's already been done before?

  4. Christopher Keil reporter

    Essentially, clustering a file - whether it already has 0, 1 or 2 axes clustered - should produce expected results. This isn't true for clustering 1 axis at the moment and the reason is mostly naming of the files, which simply prevents TV3 from loading the associated tree file that already exists.

    Lets say you have a file named xyz_average_1.cdt with one xyz_average_1.atr file because you did column clustering using the average linkage method on "xyz.txt". If you now decide to cluster the rows as well, the software will produce a new CDT file "xyz_average_2.cdt (counting up), a tree file for the previously unclustered axis (rows) "xyz_average_2.gtr" and then load these files.

    When it loads the new files it will not associate "xyz_average_1.atr" with these files because only CDT and tree files with the same name root "xyz_average_2" will be regarded as belonging together. Thus no trees are shown for the columns because the tree node data isn't there.

  5. Christopher Keil reporter

    Added another case to check in tree file integrity test

    If for some reason the current axis is considered to have been clustered before and the user does not want to explicitly recluster BUT no old tree file can be recovered for copying, then the matrix and labels will be kept in their clustered order. A dialog will pop up warning of the failed tree file recovery attempt alongside a message that no trees can be displayed as a result (no node/ link information available).

    Resolves: #256

    See also: -

    → <<cset 02aa27bd5984>>

  6. Christopher Keil reporter

    Added another case to check in tree file integrity test

    If for some reason the current axis is considered to have been clustered before and the user does not want to explicitly recluster BUT no old tree file can be recovered for copying, then the matrix and labels will be kept in their clustered order. A dialog will pop up warning of the failed tree file recovery attempt alongside a message that no trees can be displayed as a result (no node/ link information available).

    Resolves: #256

    See also: -

    → <<cset 02aa27bd5984>>

  7. Log in to comment