Clustering columns only fails

Issue #475 closed
Robert Leach created an issue

USE CASE: WHAT DO YOU WANT TO DO?

Cluster just the columns of the largest supported matrix

STEPS TO REPRODUCE AN ISSUE (OR TRIGGER A NEW FEATURE)

  1. Open large_6kx6k.txt
  2. command-c
  3. Select "Leave unchanged" for the Rows
  4. Click "Cluster"

CURRENT BEHAVIOR

The progress bar fills up and then empties. The cluster window remains on the screen and it says it's ready to cluster. Oddly enough, clustering both axes works fine. Not sure what would happen if I only clustered rows. After closing the cluster window, the matrix is still completely unclustered. No output files are produced.

Here is what appears in the console:

 Successfully loaded: /Users/rleach/PROJECT/TREEVIEW/TestData/large_6kx6k.txt
 Setting pBar max: 12000
 Initializing DistMatrixCalculator.
 DistTask is done: success.
 Done./Users/rleach/PROJECT/TREEVIEW/TestData/large_6kx6k/average/large_6kx6k_average_4.atr
 Done./Users/rleach/PROJECT/TREEVIEW/TestData/large_6kx6k/average/large_6kx6k_average_4.atr
 ProcessorClusterTask is done: success.
 ClusterTask is done: success.
 Saving did not finish successfully.
 Determined dir: /Users/rleach/PROJECT/TREEVIEW/TestData/large_6kx6k/average
Attempted delete of large_6kx6k_average_4.cdt
large_6kx6k_average_4.cdt was successfully deleted.
Attempted delete of large_6kx6k_average_4.atr
large_6kx6k_average_4.atr was successfully deleted.
Directory average still has 14 files.
average could not be deleted.

And here are screen-caps:

before2.png

warn2.png

This is after clustering:

after2.png

as is this:

after2.2.png

EXPECTED BEHAVIOR

An output file is produced & loaded and the cluster window disappears when done.

DEVELOPERS ONLY SECTION

SUGGESTED CHANGE (Pseudocode optional)

none

FILES AFFECTED (where the changes will be implemented) - developers only

unknown

LEVEL OF EFFORT - developers only

medium

COMMENTS

Comments (38)

  1. Christopher Keil repo owner

    @hepcat72 I need the 6kx6k file again. I think I lost it when my PC got fried. It wasn't in the backup files for some reason... I created another 6kx6k and it finished fine under the mentioned conditions, so it would be great to have that specific file.

  2. Christopher Keil repo owner

    @hepcat72 I cannot reproduce this issue on my Windows laptop with the file. It clusters fine. I should probably try to cluster it on the Mac again. I created a branch for the issue and added some logging statements. Could you run it again on this branch and show me the logging output?

  3. Robert Leach reporter

    Yeah, it still happens for me on the branch you made. I noted at about 63% done that DistTask is done: success. had been printed to the console, yet it still took awhile to finish. I don't know if that means anything... Anyway, here's the log output...

    Running on a Mac.
    Checking if preferences exist for the new file.
    Target node not found. Could not copy data.
    Using import dialog.
    Resetting model.
    Assigning loaded data to model...
    Parsing for CDT-format...
    Parsing label types.
    Setting up row labels for the model.
    Setting up column labels for the model.
    Adding data to model...
    Calculating mean.
    Calculating median.
    Truncating sorted data array.
    Setting base values.
    Done parsing for CDT-format.
    No ATR file found for this CDT file.
    No GTR file found for this CDT file.
    Setting model config data: User Preference Node: /TreeViewApp/TreeViewFrame/File/Model3
    Resetting MapContainers and DendroView components.
    New ColorSet: Red-Green
    New ColorSet: Yellow-Blue
    Registering Plugin Dendrogram
    ColorSet could not be returned because no Preferences node was defined. Returned default Red-Green.
    Returning default ColorSet at 0
    Warning: Encountered invalid/negative firstVisible value: [-1].  Resetting.
    Warning: Encountered invalid/too-small numVisible value: [0].  Resetting.
    Warning: Encountered invalid/negative firstVisible value: [-1].  Resetting.
    Warning: Encountered invalid/too-small numVisible value: [0].  Resetting.
    No old node was found when trying to copy old preferences. Aborting import attempt.
    Creating subNode File1477069203194
    Creating new fileset /Users/rleach/PROJECT/TREEVIEW/TestData/large_6kx6k.txt
    Restoring components states.
    Successfully loaded: /Users/rleach/PROJECT/TREEVIEW/TestData/large_6kx6k.txt
    Setting pBar max: 12000
    Initializing DistMatrixCalculator.
    DistTask is done: success.
    Done./Users/rleach/PROJECT/TREEVIEW/TestData/large_6kx6k/average/large_6kx6k_average_4.atr
    Done./Users/rleach/PROJECT/TREEVIEW/TestData/large_6kx6k/average/large_6kx6k_average_4.atr
    ProcessorClusterTask is done: success.
    ClusterTask is done: success.
    Setting up ClusterFileGenerator.
    Writing CDT cluster file...
    File path or file name not defined when looking for tree files. Aborting.
    Saving did not finish successfully.
    Determined dir: /Users/rleach/PROJECT/TREEVIEW/TestData/large_6kx6k/average
    Attempted delete of large_6kx6k_average_4.cdt
    large_6kx6k_average_4.cdt was successfully deleted.
    Attempted delete of large_6kx6k_average_4.atr
    large_6kx6k_average_4.atr was successfully deleted.
    Directory average still has 14 files.
    average could not be deleted.
    
  4. Christopher Keil repo owner

    DiskTask is the calculation of the distance matrixx before clustering can take place. So no issue about the 63%. Thank you for the log.

  5. Christopher Keil repo owner

    Okay, this is the problem: File path or file name not defined when looking for tree files. Aborting.

    This means that some issue with file writing occurs. Clustering finishes, but writing the CDT somehow fails. Can you look at the Import label PR again (updated it a few hours ago with 3.1 fixed)? I would like to investigate this further with that branch merged because issue 2 was fixed by editing the ClusterFileGenerator which fails here.

  6. Robert Leach reporter

    I thought it might be something like that - like when clustering only one side, the side not being clustered might be expected at some point to have a file?

    Anyway, I think what you're asking me is to try this procedure on your other branch to see if that fixed this. Will do.

    Also, could issue #462 be related?

  7. Robert Leach reporter

    No, the issue still exists in branch 447. Here's the log from that one as well:

    Running on a Mac.
    Checking if preferences exist for the new file.
    Loading with info from existing node.
    Resetting model.
    Adding data to model...
    Calculating mean.
    Calculating median.
    Truncating sorted data array.
    Setting base values.
    Done parsing for CDT-format.
    No ATR file found for this CDT file.
    No GTR file found for this CDT file.
    Resetting MapContainers and DendroView components.
    New ColorSet: Red-Green
    New ColorSet: Yellow-Blue
    Registering Plugin Dendrogram
    ColorSet could not be returned because no Preferences node was defined. Returned default Red-Green.
    Returning default ColorSet at 0
    Warning: Encountered invalid/negative firstVisible value: [-1].  Resetting.
    Warning: Encountered invalid/too-small numVisible value: [0].  Resetting.
    Warning: Encountered invalid/negative firstVisible value: [-1].  Resetting.
    Warning: Encountered invalid/too-small numVisible value: [0].  Resetting.
    Importing labels...
    Importing color settings...
    Found Existing node in MRU list for /Users/rleach/PROJECT/TREEVIEW/TestData/large_6kx6k.txt
    Creating subNode File1477070283016
    Creating new fileset /Users/rleach/PROJECT/TREEVIEW/TestData/large_6kx6k.txt
    Restoring components states.
    Successfully loaded: /Users/rleach/PROJECT/TREEVIEW/TestData/large_6kx6k.txt
    Setting pBar max: 12000
    Initializing DistMatrixCalculator.
    DistTask is done: success.
    Done./Users/rleach/PROJECT/TREEVIEW/TestData/large_6kx6k/average/large_6kx6k_average_4.atr
    Done./Users/rleach/PROJECT/TREEVIEW/TestData/large_6kx6k/average/large_6kx6k_average_4.atr
    ProcessorClusterTask is done: success.
    ClusterTask is done: success.
    Saving did not finish successfully.
    Determined dir: /Users/rleach/PROJECT/TREEVIEW/TestData/large_6kx6k/average
    Attempted delete of large_6kx6k_average_4.cdt
    large_6kx6k_average_4.cdt was successfully deleted.
    Attempted delete of large_6kx6k_average_4.atr
    large_6kx6k_average_4.atr was successfully deleted.
    Directory average still has 14 files.
    average could not be deleted.
    
  8. Christopher Keil repo owner

    Just confirmed that #462 is not related. This is a different bug (logging shows a different point of failure).

  9. Robert Leach reporter

    BTW, here's what's in my large_6kx6k/average directory. Might provide a clue? This screen-cap was taken while the columns were being clustered:

    large_6kx6k_average_dir_contents.png

  10. Christopher Keil repo owner

    Yes, it shows that the tree file is being generated but never written to (zero bytes). Thanks!

  11. Christopher Keil repo owner

    @hepcat72 Sorry, but could you pull and test run again? The issue is not fixed but I added additional logging and am now catching exceptions when writing a CDT.

    My Mac gets OutOfMemoryErrors and resets Clustering before reaching the point of writing (I have yet to run with larger heap size when I get home). On Windows it finishes without issues... so I am stuck not being able to reproduce this at the moment.

  12. Robert Leach reporter

    Did I rerun your test on that branch? If I did, I didn't respond here. I just returned to report what happens when I try to cluster rows only.

    1. Open small_133x133.txt (and make sure that there exists a directory with previously clustered data)
    2. command-c to cluster
    3. Select columns: leave unchanged
    4. Click Cluster
    5. Click "Yes" when asked if you want to cluster again

    I get this error:

    clusterrowsonlyerror.png

    Which is strange because the cluster code was to have created the atr and cdt files being imported. So how could they be corrupt or out of sync?

    Additionally, clustering rows only still behaves the way described in the issue. I think the two behaviors are related however. I think this all has to do with old previously clustered files being present and how they're handled.

  13. Christopher Keil repo owner

    Can you add the log statements? Also about the other issue: can you show me what files you have in your directory? (do a pull please before running again... I riddled this with log statements)

  14. Robert Leach reporter

    I just tried to switch to the branch and pull, but I'm getting an error when I pull:

    pullerroronbranch447.png

    I'll add the log messages, but I'm not sure you'll get any from the latest updates you've made...

  15. Robert Leach reporter

    Here's the log from running the steps I described in my comment 2 days ago. (BTW, looks like I misspoke in that issue - it's clustering rows only - I'll edit it.)

    Running on a Mac.
    Checking if preferences exist for the new file.
    Loading with info from existing node.
    Found COMPLEX from row label types.
    Found GWEIGHT from row label types.
    Resetting model.
    Adding data to model...
    Calculating mean.
    Calculating median.
    Truncating sorted data array.
    Setting base values.
    Done parsing for CDT-format.
    No ATR file found for this CDT file.
    No GTR file found for this CDT file.
    Resetting MapContainers and DendroView components.
    New ColorSet: Red-Green
    New ColorSet: Yellow-Blue
    Registering Plugin Dendrogram
    ColorSet could not be returned because no Preferences node was defined. Returned default Red-Green.
    Returning default ColorSet at 0
    Warning: Encountered invalid/negative firstVisible value: [-1].  Resetting.
    Warning: Encountered invalid/too-small numVisible value: [0].  Resetting.
    Warning: Encountered invalid/negative firstVisible value: [-1].  Resetting.
    Warning: Encountered invalid/too-small numVisible value: [0].  Resetting.
    Importing labels...
    Importing color settings...
    Found Existing node in MRU list for /Users/rleach/PROJECT/TREEVIEW/TestData/small_133x133.txt
    Creating subNode File1478188752713
    Creating new fileset /Users/rleach/PROJECT/TREEVIEW/TestData/small_133x133.txt
    Restoring components states.
    Successfully loaded: /Users/rleach/PROJECT/TREEVIEW/TestData/small_133x133.txt
    Setting pBar max: 266
    Initializing DistMatrixCalculator.
    DistTask is done: success.
    Done./Users/rleach/PROJECT/TREEVIEW/TestData/small_133x133/average/small_133x133_average_57.gtr
    Done./Users/rleach/PROJECT/TREEVIEW/TestData/small_133x133/average/small_133x133_average_57.gtr
    ProcessorClusterTask is done: success.
    ClusterTask is done: success.
    Done./Users/rleach/PROJECT/TREEVIEW/TestData/small_133x133/average/small_133x133_average_57.cdt
    Success! The row tree file was found.
    No file found for column trees.
    But old column tree file was found!
    Getting preferences for transfer to clustered file.
    Loading with info from existing node.
    Found COMPLEX from row label types.
    Found GWEIGHT from row label types.
    Found COLUMN LABELS 2 from col label types.
    Data start coordinates have shifted because more label types were added.
    Resetting model.
    SaveTask is done: success.
    Adding data to model...
    Calculating mean.
    Calculating median.
    Truncating sorted data array.
    Setting base values.
    Done parsing for CDT-format.
    Resetting MapContainers and DendroView components.
    Returning default ColorSet at 0
    Identifier ARRY127X from tree file not found in CDT.
     - edu.stanford.genetics.treeview.plugin.dendroview.TreeDrawer.setData(TreeDrawer.java:287)
     - edu.stanford.genetics.treeview.plugin.dendroview.TreePainter.setData(TreePainter.java:1)
     - Controllers.DendroController.bindTrees(DendroController.java:1295)
     - Controllers.DendroController.bindComponentFunctions(DendroController.java:843)
     - Controllers.DendroController.setNewMatrix(DendroController.java:184)
     - Controllers.TVController.finishLoading(TVController.java:341)
     - edu.stanford.genetics.treeview.model.ModelLoader.done(ModelLoader.java:146)
     - javax.swing.SwingWorker$5.run(SwingWorker.java:737)
     - javax.swing.SwingWorker$DoSubmitAccumulativeRunnable.run(SwingWorker.java:832)
     - sun.swing.AccumulativeRunnable.run(AccumulativeRunnable.java:112)
     - javax.swing.SwingWorker$DoSubmitAccumulativeRunnable.actionPerformed(SwingWorker.java:842)
     - javax.swing.Timer.fireActionPerformed(Timer.java:313)
     - javax.swing.Timer$DoPostEvent.run(Timer.java:245)
     - java.awt.event.InvocationEvent.dispatch(InvocationEvent.java:311)
     - java.awt.EventQueue.dispatchEventImpl(EventQueue.java:744)
     - java.awt.EventQueue.access$400(EventQueue.java:97)
     - java.awt.EventQueue$3.run(EventQueue.java:697)
     - java.awt.EventQueue$3.run(EventQueue.java:691)
     - java.security.AccessController.doPrivileged(Native Method)
     - java.security.ProtectionDomain$1.doIntersectionPrivilege(ProtectionDomain.java:75)
     - java.awt.EventQueue.dispatchEvent(EventQueue.java:714)
     - java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:201)
     - java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:116)
     - java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:105)
     - java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
     - java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93)
     - java.awt.EventDispatchThread.run(EventDispatchThread.java:82)
    Importing labels...
    Importing color settings...
    Creating subNode File1478188783338
    Creating new fileset /Users/rleach/PROJECT/TREEVIEW/TestData/small_133x133/average/small_133x133_average_57.cdt
    Restoring components states.
    Successfully loaded: /Users/rleach/PROJECT/TREEVIEW/TestData/small_133x133/average/small_133x133_average_57.cdt
    

    And here's a listing of my average directory:

    clustercolsthenclusterboth-badrowtreeresult.zip
    forPR94issue1.zip
    small_133x133_average_1.atr
    small_133x133_average_1.cdt
    small_133x133_average_1.gtr
    small_133x133_average_2.atr
    small_133x133_average_2.cdt
    small_133x133_average_2.gtr
    small_133x133_average_3.atr
    small_133x133_average_3.cdt
    small_133x133_average_3.gtr
    small_133x133_average_4.atr
    small_133x133_average_4.cdt
    small_133x133_average_4.gtr
    small_133x133_average_5.atr
    small_133x133_average_5.cdt
    small_133x133_average_5.gtr
    small_133x133_average_6.atr
    small_133x133_average_6.cdt
    small_133x133_average_6.gtr
    small_133x133_average_7.atr
    small_133x133_average_7.cdt
    small_133x133_average_7.gtr
    small_133x133_average_8.atr
    small_133x133_average_8.cdt
    small_133x133_average_8.gtr
    small_133x133_average_9.atr
    small_133x133_average_9.cdt
    small_133x133_average_9.gtr
    small_133x133_average_10.atr
    small_133x133_average_10.cdt
    small_133x133_average_10.gtr
    small_133x133_average_11.atr
    small_133x133_average_11.cdt
    small_133x133_average_11.gtr
    small_133x133_average_12.atr
    small_133x133_average_12.cdt
    small_133x133_average_12.gtr
    small_133x133_average_13.atr
    small_133x133_average_13.cdt
    small_133x133_average_13.gtr
    small_133x133_average_14.atr
    small_133x133_average_14.cdt
    small_133x133_average_14.gtr
    small_133x133_average_15.atr
    small_133x133_average_15.cdt
    small_133x133_average_15.gtr
    small_133x133_average_16.atr
    small_133x133_average_16.cdt
    small_133x133_average_16.gtr
    small_133x133_average_17.atr
    small_133x133_average_17.cdt
    small_133x133_average_17.gtr
    small_133x133_average_18.atr
    small_133x133_average_18.cdt
    small_133x133_average_18.gtr
    small_133x133_average_19.atr
    small_133x133_average_19.cdt
    small_133x133_average_19.gtr
    small_133x133_average_20.atr
    small_133x133_average_20.cdt
    small_133x133_average_20.gtr
    small_133x133_average_21.atr
    small_133x133_average_21.cdt
    small_133x133_average_21.gtr
    small_133x133_average_22.atr
    small_133x133_average_22.cdt
    small_133x133_average_22.gtr
    small_133x133_average_23.atr
    small_133x133_average_23.cdt
    small_133x133_average_23.gtr
    small_133x133_average_24.atr
    small_133x133_average_24.cdt
    small_133x133_average_24.gtr
    small_133x133_average_25.cdt
    small_133x133_average_25.gtr
    small_133x133_average_26.atr
    small_133x133_average_26.cdt
    small_133x133_average_27.cdt
    small_133x133_average_27.gtr
    small_133x133_average_28.atr
    small_133x133_average_28.cdt
    small_133x133_average_29.atr
    small_133x133_average_29.cdt
    small_133x133_average_29.gtr
    small_133x133_average_30.atr
    small_133x133_average_30.cdt
    small_133x133_average_31.atr
    small_133x133_average_31.cdt
    small_133x133_average_31.gtr
    small_133x133_average_32.atr
    small_133x133_average_32.cdt
    small_133x133_average_32.gtr
    small_133x133_average_33.atr
    small_133x133_average_33.cdt
    small_133x133_average_33.gtr
    small_133x133_average_34.atr
    small_133x133_average_34.cdt
    small_133x133_average_34.gtr
    small_133x133_average_35.atr
    small_133x133_average_35.cdt
    small_133x133_average_35.gtr
    small_133x133_average_36.atr
    small_133x133_average_36.cdt
    small_133x133_average_36.gtr
    small_133x133_average_37.atr
    small_133x133_average_37.cdt
    small_133x133_average_38.atr
    small_133x133_average_38.cdt
    small_133x133_average_38.gtr
    small_133x133_average_39.atr
    small_133x133_average_39.cdt
    small_133x133_average_39.gtr
    small_133x133_average_40.atr
    small_133x133_average_40.cdt
    small_133x133_average_40.gtr
    small_133x133_average_41.atr
    small_133x133_average_41.cdt
    small_133x133_average_41.gtr
    small_133x133_average_41.gtr.new
    small_133x133_average_42.atr
    small_133x133_average_42.cdt
    small_133x133_average_42.gtr
    small_133x133_average_43.atr
    small_133x133_average_43.cdt
    small_133x133_average_43.gtr
    small_133x133_average_44.atr
    small_133x133_average_44.cdt
    small_133x133_average_44.gtr
    small_133x133_average_45.atr
    small_133x133_average_45.cdt
    small_133x133_average_45.gtr
    small_133x133_average_46.atr
    small_133x133_average_46.cdt
    small_133x133_average_46.gtr
    small_133x133_average_47.atr
    small_133x133_average_47.cdt
    small_133x133_average_47.gtr
    small_133x133_average_48.atr
    small_133x133_average_48.cdt
    small_133x133_average_48.cdt.PNG
    small_133x133_average_48.gtr
    small_133x133_average_49.atr
    small_133x133_average_49.cdt
    small_133x133_average_49.gtr
    small_133x133_average_50.atr
    small_133x133_average_50.cdt
    small_133x133_average_50.gtr
    small_133x133_average_51.atr
    small_133x133_average_51.cdt
    small_133x133_average_51.gtr
    small_133x133_average_52.atr
    small_133x133_average_52.cdt
    small_133x133_average_52.gtr
    small_133x133_average_53.atr
    small_133x133_average_53.cdt
    small_133x133_average_53.gtr
    small_133x133_average_54.atr
    small_133x133_average_54.cdt
    small_133x133_average_54.gtr
    small_133x133_average_55.atr
    small_133x133_average_55.cdt
    small_133x133_average_55.gtr
    small_133x133_average_56.atr
    small_133x133_average_56.cdt
    small_133x133_average_56.gtr
    small_133x133_average_57.atr
    small_133x133_average_57.cdt
    small_133x133_average_57.gtr
    small_133x133_average.atr
    small_133x133_average.cdt
    small_133x133_average.gtr
    small_133x133-treeview.PNG
    

    I also have a complete directory next to average.

  16. Robert Leach reporter

    And here's the additional log messages when clustering using the steps from the issue description (clustering columns only):

    Setting pBar max: 266
    Initializing DistMatrixCalculator.
    DistTask is done: success.
    Done./Users/rleach/PROJECT/TREEVIEW/TestData/small_133x133/average/small_133x133_average_58.atr
    Done./Users/rleach/PROJECT/TREEVIEW/TestData/small_133x133/average/small_133x133_average_58.atr
    ProcessorClusterTask is done: success.
    ClusterTask is done: success.
    Saving did not finish successfully.
    Determined dir: /Users/rleach/PROJECT/TREEVIEW/TestData/small_133x133/average
    Attempted delete of small_133x133_average_58.cdt
    small_133x133_average_58.cdt was successfully deleted.
    Attempted delete of small_133x133_average_58.atr
    small_133x133_average_58.atr was successfully deleted.
    Directory average still has 173 files.
    average could not be deleted.
    

    Oh yeah - I ran it on small_133x133.txt instead of the 6k file. Shorter test, same behavior.

  17. Christopher Keil repo owner

    Since the 133x133 test is very short, can you pull the branch and just run the clustering output again? I added a lot of logging statements which don't appear in your recent out put. What happens when you clear the directory (if you want to keep the files just temporarily move the average folder)?

  18. Robert Leach reporter

    As I mentioned above...

    I just tried to switch to the branch and pull, but I'm getting an error when I pull:

    pullerroronbranch447.png

    I'll add the log messages, but I'm not sure you'll get any from the latest updates you've made...

  19. Robert Leach reporter

    Here's the result of clustering columns only:

    Running on a Mac.
    Checking if preferences exist for the new file.
    Loading with info from existing node.
    Resetting model.
    Assigning loaded data to model...
    Parsing for CDT-format...
    Parsing label types.
    Setting up row labels for the model.
    Setting up column labels for the model.
    Adding data to model...
    Truncating sorted data array.
    Done parsing for CDT-format.
    No ATR file found for this CDT file.
    No GTR file found for this CDT file.
    Setting model config data: User Preference Node: /TreeViewApp/TreeViewFrame/File/Model4
    Resetting MapContainers and DendroView components.
    New ColorSet: Red-Green
    New ColorSet: Yellow-Blue
    Registering Plugin Dendrogram
    ColorSet could not be returned because no Preferences node was defined. Returned default Red-Green.
    Returning default ColorSet at 0
    Warning: Encountered invalid/negative firstVisible value: [-1].  Resetting.
    Warning: Encountered invalid/too-small numVisible value: [0].  Resetting.
    Warning: Encountered invalid/negative firstVisible value: [-1].  Resetting.
    Warning: Encountered invalid/too-small numVisible value: [0].  Resetting.
    Importing labels...
    Importing color settings...
    Found Existing node in MRU list for /Users/rleach/PROJECT/TREEVIEW/TestData/small_133x133.txt
    Creating subNode File1478273812310
    Creating new fileset /Users/rleach/PROJECT/TREEVIEW/TestData/small_133x133.txt
    Restoring components states.
    Successfully loaded: /Users/rleach/PROJECT/TREEVIEW/TestData/small_133x133.txt
    Clustering ready. Beginning task...
    Performing hierarchical cluster.
    Checking if cluster needs to be reaffirmed.
    Should cluster axes? (row, col): [false, true]
    Finished setup of directory structure. Common path for cluster files: /Users/rleach/PROJECT/TREEVIEW/TestData/small_133x133/average/small_133x133_average
    Generated ATR file (empty): small_133x133_average_59.atr
    Initializing DistMatrixCalculator (2)
    DistTask is done: success.
    Done. Closing writer for small_133x133_average_59.atr
    Got reordered 133 labels for column
    Reordering complete.
    Post-cluster reordering valid? true
    Generated CDT file: small_133x133_average_59.cdt
    Done. Closing writer for small_133x133_average_59.atr
    Data reordering is done: success.
    Post-cluster reordering valid? true
    ClusterTask is done: success.
    Setting up ClusterFileGenerator.
    Writing CDT cluster file...
    0
     - Cluster.ClusterFileGenerator.createHierCDT(ClusterFileGenerator.java:394)
     - Cluster.ClusterFileGenerator.generateCDT(ClusterFileGenerator.java:122)
     - Controllers.ClusterDialogController$SaveTask.doInBackground(ClusterDialogController.java:666)
     - Controllers.ClusterDialogController$SaveTask.doInBackground(ClusterDialogController.java:1)
     - javax.swing.SwingWorker$1.call(SwingWorker.java:295)
     - java.util.concurrent.FutureTask.run(FutureTask.java:266)
     - javax.swing.SwingWorker.run(SwingWorker.java:334)
     - java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
     - java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
     - java.lang.Thread.run(Thread.java:745)
    Error when writing the CDT file. Cancelling.
    Saving did not finish successfully.
    Determined dir: /Users/rleach/PROJECT/TREEVIEW/TestData/small_133x133/average
    Attempting delete of cluster files in /Users/rleach/PROJECT/TREEVIEW/TestData/small_133x133/average
    Attempted delete of small_133x133_average_59.cdt
    small_133x133_average_59.cdt was successfully deleted.
    Attempted delete of small_133x133_average_59.atr
    small_133x133_average_59.atr was successfully deleted.
    Directory average still has 176 files.
    average could not be deleted.
    

    And from clustering rows only:

    Clustering ready. Beginning task...
    Performing hierarchical cluster.
    Checking if cluster needs to be reaffirmed.
    Should cluster axes? (row, col): [true, false]
    Finished setup of directory structure. Common path for cluster files: /Users/rleach/PROJECT/TREEVIEW/TestData/small_133x133/average/small_133x133_average
    Generated GTR file (empty): small_133x133_average_59.gtr
    Initializing DistMatrixCalculator (1)
    DistTask is done: success.
    Done. Closing writer for small_133x133_average_59.gtr
    Got reordered 133 labels for row
    Reordering complete.
    Post-cluster reordering valid? true
    Generated CDT file: small_133x133_average_59.cdt
    Done. Closing writer for small_133x133_average_59.gtr
    Data reordering is done: success.
    Post-cluster reordering valid? true
    ClusterTask is done: success.
    Setting up ClusterFileGenerator.
    Writing CDT cluster file...
    Done. Closing writer for small_133x133_average_59.cdt
    Checking if row tree file is present.
    Success! The row tree file was found.
    Checking if column tree file is present.
    No file found for column trees.
    But old column tree file was found!
    Getting preferences for transfer to clustered file.
    Loading with info from existing node.
    Resetting model.
    SaveTask is done: success.
    Assigning loaded data to model...
    Parsing for CDT-format...
    Parsing label types.
    Setting up row labels for the model.
    Setting up column labels for the model.
    Adding data to model...
    Truncating sorted data array.
    Done parsing for CDT-format.
    Setting model config data: User Preference Node: /TreeViewApp/TreeViewFrame/File/Model30
    Resetting MapContainers and DendroView components.
    Returning default ColorSet at 0
    Identifier ARRY127X from tree file not found in CDT.
     - edu.stanford.genetics.treeview.plugin.dendroview.TreeDrawer.setData(TreeDrawer.java:287)
     - edu.stanford.genetics.treeview.plugin.dendroview.TreePainter.setData(TreePainter.java:1)
     - Controllers.DendroController.bindTrees(DendroController.java:1295)
     - Controllers.DendroController.bindComponentFunctions(DendroController.java:843)
     - Controllers.DendroController.setNewMatrix(DendroController.java:184)
     - Controllers.TVController.finishLoading(TVController.java:342)
     - edu.stanford.genetics.treeview.model.ModelLoader.done(ModelLoader.java:149)
     - javax.swing.SwingWorker$5.run(SwingWorker.java:737)
     - javax.swing.SwingWorker$DoSubmitAccumulativeRunnable.run(SwingWorker.java:832)
     - sun.swing.AccumulativeRunnable.run(AccumulativeRunnable.java:112)
     - javax.swing.SwingWorker$DoSubmitAccumulativeRunnable.actionPerformed(SwingWorker.java:842)
     - javax.swing.Timer.fireActionPerformed(Timer.java:313)
     - javax.swing.Timer$DoPostEvent.run(Timer.java:245)
     - java.awt.event.InvocationEvent.dispatch(InvocationEvent.java:311)
     - java.awt.EventQueue.dispatchEventImpl(EventQueue.java:744)
     - java.awt.EventQueue.access$400(EventQueue.java:97)
     - java.awt.EventQueue$3.run(EventQueue.java:697)
     - java.awt.EventQueue$3.run(EventQueue.java:691)
     - java.security.AccessController.doPrivileged(Native Method)
     - java.security.ProtectionDomain$1.doIntersectionPrivilege(ProtectionDomain.java:75)
     - java.awt.EventQueue.dispatchEvent(EventQueue.java:714)
     - java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:201)
     - java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:116)
     - java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:105)
     - java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
     - java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93)
     - java.awt.EventDispatchThread.run(EventDispatchThread.java:82)
    Importing labels...
    Importing color settings...
    Creating subNode File1478273923664
    Creating new fileset /Users/rleach/PROJECT/TREEVIEW/TestData/small_133x133/average/small_133x133_average_59.cdt
    Restoring components states.
    Successfully loaded: /Users/rleach/PROJECT/TREEVIEW/TestData/small_133x133/average/small_133x133_average_59.cdt
    

    There are new warning/error dialogs that come up too. Let me know if you want screen caps.

  20. Christopher Keil repo owner

    Thanks for the output, that helps. Can you zip up and send me the last couple of cluster file groups (atr, cdt, gtr between 50 and 59)? Or the entire folder if that's more convenient.

  21. Christopher Keil repo owner

    The output for clustering rows only is actually correct and how it should be. In your case you are trying to open a file which was somehow clustered before by TreeView3 (either or both axes), otherwise the log lines

    Checking if column tree file is present.
    No file found for column trees.
    But old column tree file was found!
    

    could not be present in the log because ClusterTask.reaffirmClusterChoice() has set ClusterData.isAxisClustered to true.

    TreeView3 clustered files use tree identifiers of the format ROW12X or COL12X whereas files from Cluster 3.0 use the GENE/ARRY scheme which we have abandoned. The row-clustering error you posted displays a mismatch between cdt and column tree files based on identifier names.

    To handle it more gracefully, I have added a check to file loading which tests the compatibility of tree files and matrix, warning if they are incompatible. They are considered incompatible if the identifiers between tree files and matrix do not match. Matrices fully clustered in Cluster 3.0 can still be loaded fine in TreeView3 since tree identifiers are the same in atr/gtr and cdt files. No random identifier formats other than ROW/COL or GENE/ARRY are allowed.

    This way of handling makes sense because in my opinion there is near zero reason to cluster a matrix in TreeView and then replace the tree files with other tree files from Cluster 3.0 or vice versa.

    I don't know how that tree file got into your directory with the same name as the cdt-file (copy-paste?). But the mismatch is responsible for your second error and TreeView3 did not handle it in a user friendly manner.

    Could you pull this branch again and attempt another run and post the log messages? The first issue never occurs for me, no matter what. I pulled master branch updates in which corrected older cdt file writing issues on this branch. Maybe it helps.

    And last, you may notice an auto-detect label issue. In response I have created BB issue #503 to keep unrelated changes separate and explained the underlying problem.

  22. Robert Leach reporter

    Here's the latest console output when performing the procedure described in the issue (except with the small_133x133.txt input file):

    Running on a Mac.
    Saving window dimensions & position.
    Checking if preferences exist for the new file.
    Loading with info from existing node.
    Resetting model...
    Adding data to model...
    Truncating sorted data array.
    Done parsing for CDT-format.
    No ATR file found for this CDT file.
    No GTR file found for this CDT file.
    Resetting MapContainers and DendroView components.
    New ColorSet: Red-Green
    New ColorSet: Yellow-Blue
    Registering Plugin Dendrogram
    ColorSet could not be returned because no Preferences node was defined. Returned default Red-Green.
    Returning default ColorSet at 0
    Last active set: Red-Green
    Warning: Encountered invalid/negative firstVisible value: [-1].  Resetting.
    Warning: Encountered invalid/too-small numVisible value: [0].  Resetting.
    Warning: Encountered invalid/negative firstVisible value: [-1].  Resetting.
    Warning: Encountered invalid/too-small numVisible value: [0].  Resetting.
    Importing labels...
    Importing color settings...
    Saving to ColorSet Custom
    Last active set: Red-Green
    Found Existing node in MRU list for /Users/rleach/PROJECT/TREEVIEW/TestData/small_133x133.txt
    Creating subNode File1479227738221
    Creating new fileset /Users/rleach/PROJECT/TREEVIEW/TestData/small_133x133.txt
    Restoring components states.
    Successfully loaded: /Users/rleach/PROJECT/TREEVIEW/TestData/small_133x133.txt
    Clustering ready. Beginning task...
    Performing hierarchical cluster.
    Checking if cluster needs to be reaffirmed.
    Should cluster axes? (row, col): [false, true]
    Finished setup of directory structure. Common path for cluster files: /Users/rleach/PROJECT/TREEVIEW/TestData/small_133x133/average/small_133x133_average
    Generated ATR file (empty): small_133x133_average_60.atr
    Initializing DistMatrixCalculator (2)
    DistTask is done: success.
    Done. Closing writer for small_133x133_average_60.atr
    Got reordered 133 labels for column
    Reordering complete.
    Post-cluster reordering valid? true
    Generated CDT file: small_133x133_average_60.cdt
    Done. Closing writer for small_133x133_average_60.atr
    Data reordering is done: success.
    Post-cluster reordering valid? true
    ClusterTask is done: success.
    Setting up ClusterFileGenerator.
    Writing CDT cluster file...
    The array of ordered row tree identifiers does not match the amount of rows in the original data matrix.
    An error occurred when writing the clustered matrix file (cdt).
    
    null
    Done. Closing writer for small_133x133_average_60.cdt
    Checking if row tree file is present.
    No file found for row trees.
    But old row tree file was found!
    Checking if column tree file is present.
    Success! The column tree file was found.
    Getting preferences for transfer to clustered file.
    Loading with info from existing node.
    Data start coordinates have shifted because more label types were added.
    Resetting model...
    SaveTask is done: success.
    Alert: No numeric data could be found in the input file.
    The input file must contain tab-delimited numeric values.
    

    Also, I got this error dialog:

    clusterdataimporterror.png

  23. Robert Leach reporter

    I have been looking into this issue further today and regarding your comment @TreeView3Dev that "The output for clustering rows only is actually correct and how it should be."... I believe that it is not how it should be. Here's why. The user has opened a completely unclustered file and has chosen to leave either the columns or the rows unchanged. Yes, they have previously clustered before, but what they are looking at is completely unclustered (as they have opened a ".txt" file). Whether or not they encounter this specific issue with the GENE#X versus ROW#X IDs, the code should not be retrieving and parsing the previously clustered data for the columns or rows which they selected to "leave unchanged". Its the ID issue which has brought this bug to light however, so it's a good thing it did.

    Incidentally, I had put a try/catch around the specific lines of code that were causing the saving of the clustered file to fail and by simply ignoring the exception, you end up with something that you would somewhat expect to get: only one dimension clustered. You do get an error about the IDs being out of sync and the label types in the label settings dialog is out of synch, but at least the clustered is somewhat how you might expect it. I'll need to look at it more closely to be sure, but regardless, the order of the "unchanged" dimension should be retained as it is in the file the user is viewing.

  24. Robert Leach reporter

    I have determined a bit more about this issue, which has corrected an assumption I was making before. I was trying to determine which tree file was being associated with the unclustered small_133x133.txt file. I had assumed it was grabbing the most recent tree files from the small_133x133 directory, but it is not. It is grabbing tree files in the same directory that were generated when small_133x133.cdt was created. I don't recall how those files were initially created. I suspect I got them from Cluster.app when I was using treeview 2 long ago. The two files (.txt versus .cdt) sit next to each other. Even though I opened the .txt file, which has no real tree files associated with it, the code believes that the similarly named .atr and .gtr files go with it (and therefore that it is clustered) even though those files were generated when the .cdt file was created.

    The .txt unclustered original file doesn't have a GID column or AID row, so when the code tries to associate the gtr file, it cannot do it and thus the failure.

    What probably should be happening is the .txt file should be checked for IDs that match the IDs in the tree files. If they don't have a matching row/col of IDs, they shouldn't be associated. Or, an easier check would be to see if the extension of the opened file was '.cdt'. But then that could miss tree files generated for files with different extensions, such as .pcl. So it's probably best to simply do the ID check.

  25. Robert Leach reporter

    I believe I've fixed the crux of this issue by changing the assumption (that if either a GID/AID exists or the .gtr/.atr file exists, it means that the file has been clustered) TO requiring both a GID/AID to exist (in the matrix file) and for the .gtr/.atr file to exist before assuming that the file is clustered. This seems at first glance to completely fix this issue.

  26. Log in to comment