Trees/clustering mal-formed on some data

Issue #405 resolved
Robert Leach created an issue

USE CASE: WHAT DO YOU WANT TO DO?

Cluster the data and see trees that have a leaf for every column and/or row.

STEPS TO REPRODUCE AN ISSUE (OR TRIGGER A NEW FEATURE)

  1. Open John matese's 11.cdt file.
  2. Cluster only the columns with default params
  3. Observe the structure and behavior of the left-most 5 columns when hovering over them

CURRENT BEHAVIOR

The left-most leaf of the tree appears to start at data column 4. There is no apparent leaf for column 5 (only an internal node). Column 1 has data. Columns 2-5 are columns that are composed completely of empty values (which is correct - haven't lost data). Columns 6 to the end each have at least some data. The bolding of the left-most visible branch (column 4) appears incorrect. Tree drawing in the first 5 columns is incorrect. Selecting a node in the tree will select the left-most columns.

At first glance, the issue appears to be some sort of tree offset issue:

column_tree_offset.png

But looking more closely, you see what appear to be missing branches (columns 1, 2, 3, & 5):

column_tree_wrong_selection.png

Branch bolding of column 4 appears to show nothing and branch bolding for column 5 seems odd:

column_tree_red_branch_wrong.png

What seems to actually be happening is that there are 4 leaves on the left side of the tree and the subtrees they define all have a height of 0 (or 1.0, depending on how you look at it).

EXPECTED BEHAVIOR

Despite the fact that the tree appears to be malformed since 4 leaves and the subtrees they compose have a combined height of 0. So the branches are all there and bolding appears to be happening technically correctly. What should be done is to offset the bottom of the trees up a little so that a series of flat branches can be shown.

DEVELOPERS ONLY SECTION

SUGGESTED CHANGE (Pseudocode optional)

Reduce the height of the trees and offset them up slightly. Include branches at height 0.

FILES AFFECTED (where the changes will be implemented) - developers only

unknown

LEVEL OF EFFORT - developers only

medium

COMMENTS

UPDATE: I discovered by using Cluster.app that when these leaves/subtrees of height 1.0 WILL get drawn in a better way if they are in the middle of the tree. When they are on the left side, the branches are not visible. Another issue is that there are essentially (potentially) multiple issues here:

  1. Leaves/subtrees at height 1.0 are not drawn properly when they are on the "left" side of the tree (and correctly when they're in the middle).
  2. It's possible that clustering is not being done correctly (either 1. it's potentially done incorrectly in Cluster.app as well - both apps equate the distance between an empty column and 1 column with a reasonable amount of data. 2. our clustering is different from that of cluster.app - so 1 may be doing it incorrectly in another sense 3. both are true)

UPDATE2: I discovered that when a subtree's correlation value is 1 (depicted at the bottom level of the tree), sometimes you can see it and other times, it drops off the bottom. I debug-printed the y-coordinate pixel and the correlation value for each node of the tree and I loaded the same set of data, but clustered using Cluster.app. The resulting tree was essentially the same, but rearranged a little. The same flat subtree of 4 leaves for columns with no data exist, but are in the middle of the tree. However, the subtrees were visible as a line of pixels. The y coordinate of the pixels were 1 up from the other tree (31st instead of 32nd pixel from the top). The correlation values were the same (1.0). I'm not sure why it was calculating differently in the 2 cases - might be a precision issue, but I realized that the values given when creating the yScaleEq and xScaleEq objects for columns and rows respectively, the end coordinate was too large by 1 pixel. So this needs to be adjusted.

Comments (11)

  1. Robert Leach reporter

    If I open the same clustered data in Java TreeView2, I get the same display behavior:

    tree_messed_up_jtv.png

    The first 4 rows of the .atr file have values of 1.0 and match the image and genes in tv3. Their distance metrics are all 1.0. Probably has something to do with the clustering.

  2. Robert Leach reporter

    Looks like the data for the tree is technically correct, branch-wise. It's still binary, but 4 of the branch heights are all 0:

    NODE1X  COL12X  COL0X   1.0
    NODE2X  COL39X  NODE1X  1.0
    NODE3X  COL88X  NODE2X  1.0
    NODE4X  COL108X NODE3X  1.0
    NODE5X  COL13X  NODE4X  0.91580667236
    ...
    

    and that's not drawn at all. The leftmost visible branch looks like it's a leaf that skips a column, but it actually connects to a node whoe entire subtree height is 0.

  3. Robert Leach reporter
    • edited description

    Edited the issue to reflect a better understanding of what is happening.

  4. Robert Leach reporter

    Hey @TreeView3Dev, I was trying to debug this issue and check to see if our clustering is not correctly handling columns with empty data, so I tried running the original cluster tool that complemented Java TreeView and I cannot get our clustering to look like the original cluster app's clustering even though I think I'm using the same parameters. Here's a screen cap of the same file clustered by our app and by the original cluster app. They look somewhat different - but I can't tell for sure if it's just the ordering of the leaves or not (because essentially every subtree can be rotated freely since it doesn't change the distance):

    cluster_app_vs_treeview.png

    What are the parameters I should be using in Cluster.app to try and produce the same/similar result other than linkage type and distance metric?

  5. Log in to comment