Clustering columns only fails
USE CASE: WHAT DO YOU WANT TO DO?
Cluster just the columns of the largest supported matrix
STEPS TO REPRODUCE AN ISSUE (OR TRIGGER A NEW FEATURE)
- Open large_6kx6k.txt
- command-c
- Select "Leave unchanged" for the Rows
- Click "Cluster"
CURRENT BEHAVIOR
The progress bar fills up and then empties. The cluster window remains on the screen and it says it's ready to cluster. Oddly enough, clustering both axes works fine. Not sure what would happen if I only clustered rows. After closing the cluster window, the matrix is still completely unclustered. No output files are produced.
Here is what appears in the console:
Successfully loaded: /Users/rleach/PROJECT/TREEVIEW/TestData/large_6kx6k.txt
Setting pBar max: 12000
Initializing DistMatrixCalculator.
DistTask is done: success.
Done./Users/rleach/PROJECT/TREEVIEW/TestData/large_6kx6k/average/large_6kx6k_average_4.atr
Done./Users/rleach/PROJECT/TREEVIEW/TestData/large_6kx6k/average/large_6kx6k_average_4.atr
ProcessorClusterTask is done: success.
ClusterTask is done: success.
Saving did not finish successfully.
Determined dir: /Users/rleach/PROJECT/TREEVIEW/TestData/large_6kx6k/average
Attempted delete of large_6kx6k_average_4.cdt
large_6kx6k_average_4.cdt was successfully deleted.
Attempted delete of large_6kx6k_average_4.atr
large_6kx6k_average_4.atr was successfully deleted.
Directory average still has 14 files.
average could not be deleted.
And here are screen-caps:
This is after clustering:
as is this:
EXPECTED BEHAVIOR
An output file is produced & loaded and the cluster window disappears when done.
DEVELOPERS ONLY SECTION
SUGGESTED CHANGE (Pseudocode optional)
none
FILES AFFECTED (where the changes will be implemented) - developers only
unknown
LEVEL OF EFFORT - developers only
medium
COMMENTS
Comments (38)
-
repo owner -
repo owner @hepcat72 I cannot reproduce this issue on my Windows laptop with the file. It clusters fine. I should probably try to cluster it on the Mac again. I created a branch for the issue and added some logging statements. Could you run it again on this branch and show me the logging output?
-
reporter Yeah, it still happens for me on the branch you made. I noted at about 63% done that
DistTask is done: success.
had been printed to the console, yet it still took awhile to finish. I don't know if that means anything... Anyway, here's the log output...Running on a Mac. Checking if preferences exist for the new file. Target node not found. Could not copy data. Using import dialog. Resetting model. Assigning loaded data to model... Parsing for CDT-format... Parsing label types. Setting up row labels for the model. Setting up column labels for the model. Adding data to model... Calculating mean. Calculating median. Truncating sorted data array. Setting base values. Done parsing for CDT-format. No ATR file found for this CDT file. No GTR file found for this CDT file. Setting model config data: User Preference Node: /TreeViewApp/TreeViewFrame/File/Model3 Resetting MapContainers and DendroView components. New ColorSet: Red-Green New ColorSet: Yellow-Blue Registering Plugin Dendrogram ColorSet could not be returned because no Preferences node was defined. Returned default Red-Green. Returning default ColorSet at 0 Warning: Encountered invalid/negative firstVisible value: [-1]. Resetting. Warning: Encountered invalid/too-small numVisible value: [0]. Resetting. Warning: Encountered invalid/negative firstVisible value: [-1]. Resetting. Warning: Encountered invalid/too-small numVisible value: [0]. Resetting. No old node was found when trying to copy old preferences. Aborting import attempt. Creating subNode File1477069203194 Creating new fileset /Users/rleach/PROJECT/TREEVIEW/TestData/large_6kx6k.txt Restoring components states. Successfully loaded: /Users/rleach/PROJECT/TREEVIEW/TestData/large_6kx6k.txt Setting pBar max: 12000 Initializing DistMatrixCalculator. DistTask is done: success. Done./Users/rleach/PROJECT/TREEVIEW/TestData/large_6kx6k/average/large_6kx6k_average_4.atr Done./Users/rleach/PROJECT/TREEVIEW/TestData/large_6kx6k/average/large_6kx6k_average_4.atr ProcessorClusterTask is done: success. ClusterTask is done: success. Setting up ClusterFileGenerator. Writing CDT cluster file... File path or file name not defined when looking for tree files. Aborting. Saving did not finish successfully. Determined dir: /Users/rleach/PROJECT/TREEVIEW/TestData/large_6kx6k/average Attempted delete of large_6kx6k_average_4.cdt large_6kx6k_average_4.cdt was successfully deleted. Attempted delete of large_6kx6k_average_4.atr large_6kx6k_average_4.atr was successfully deleted. Directory average still has 14 files. average could not be deleted.
-
reporter Are you clustering columns only, as stated in the reproduce steps?
-
reporter Because clustering both axes works fine.
-
reporter It's only when clustering columns only when the failure happens.
-
repo owner DiskTask is the calculation of the distance matrixx before clustering can take place. So no issue about the 63%. Thank you for the log.
-
repo owner Okay, this is the problem:
File path or file name not defined when looking for tree files. Aborting.
This means that some issue with file writing occurs. Clustering finishes, but writing the CDT somehow fails. Can you look at the Import label PR again (updated it a few hours ago with 3.1 fixed)? I would like to investigate this further with that branch merged because issue 2 was fixed by editing the ClusterFileGenerator which fails here.
-
reporter I thought it might be something like that - like when clustering only one side, the side not being clustered might be expected at some point to have a file?
Anyway, I think what you're asking me is to try this procedure on your other branch to see if that fixed this. Will do.
Also, could issue
#462be related? -
reporter No, the issue still exists in branch 447. Here's the log from that one as well:
Running on a Mac. Checking if preferences exist for the new file. Loading with info from existing node. Resetting model. Adding data to model... Calculating mean. Calculating median. Truncating sorted data array. Setting base values. Done parsing for CDT-format. No ATR file found for this CDT file. No GTR file found for this CDT file. Resetting MapContainers and DendroView components. New ColorSet: Red-Green New ColorSet: Yellow-Blue Registering Plugin Dendrogram ColorSet could not be returned because no Preferences node was defined. Returned default Red-Green. Returning default ColorSet at 0 Warning: Encountered invalid/negative firstVisible value: [-1]. Resetting. Warning: Encountered invalid/too-small numVisible value: [0]. Resetting. Warning: Encountered invalid/negative firstVisible value: [-1]. Resetting. Warning: Encountered invalid/too-small numVisible value: [0]. Resetting. Importing labels... Importing color settings... Found Existing node in MRU list for /Users/rleach/PROJECT/TREEVIEW/TestData/large_6kx6k.txt Creating subNode File1477070283016 Creating new fileset /Users/rleach/PROJECT/TREEVIEW/TestData/large_6kx6k.txt Restoring components states. Successfully loaded: /Users/rleach/PROJECT/TREEVIEW/TestData/large_6kx6k.txt Setting pBar max: 12000 Initializing DistMatrixCalculator. DistTask is done: success. Done./Users/rleach/PROJECT/TREEVIEW/TestData/large_6kx6k/average/large_6kx6k_average_4.atr Done./Users/rleach/PROJECT/TREEVIEW/TestData/large_6kx6k/average/large_6kx6k_average_4.atr ProcessorClusterTask is done: success. ClusterTask is done: success. Saving did not finish successfully. Determined dir: /Users/rleach/PROJECT/TREEVIEW/TestData/large_6kx6k/average Attempted delete of large_6kx6k_average_4.cdt large_6kx6k_average_4.cdt was successfully deleted. Attempted delete of large_6kx6k_average_4.atr large_6kx6k_average_4.atr was successfully deleted. Directory average still has 14 files. average could not be deleted.
-
repo owner Just confirmed that
#462is not related. This is a different bug (logging shows a different point of failure). -
reporter BTW, here's what's in my large_6kx6k/average directory. Might provide a clue? This screen-cap was taken while the columns were being clustered:
-
repo owner Yes, it shows that the tree file is being generated but never written to (zero bytes). Thanks!
-
repo owner @hepcat72 Sorry, but could you pull and test run again? The issue is not fixed but I added additional logging and am now catching exceptions when writing a CDT.
My Mac gets OutOfMemoryErrors and resets Clustering before reaching the point of writing (I have yet to run with larger heap size when I get home). On Windows it finishes without issues... so I am stuck not being able to reproduce this at the moment.
-
repo owner - changed status to open
-
repo owner -
assigned issue to
-
assigned issue to
-
reporter Did I rerun your test on that branch? If I did, I didn't respond here. I just returned to report what happens when I try to cluster rows only.
- Open small_133x133.txt (and make sure that there exists a directory with previously clustered data)
- command-c to cluster
- Select columns: leave unchanged
- Click Cluster
- Click "Yes" when asked if you want to cluster again
I get this error:
Which is strange because the cluster code was to have created the atr and cdt files being imported. So how could they be corrupt or out of sync?
Additionally, clustering rows only still behaves the way described in the issue. I think the two behaviors are related however. I think this all has to do with old previously clustered files being present and how they're handled.
-
repo owner Can you add the log statements? Also about the other issue: can you show me what files you have in your directory? (do a pull please before running again... I riddled this with log statements)
-
reporter I just tried to switch to the branch and pull, but I'm getting an error when I pull:
I'll add the log messages, but I'm not sure you'll get any from the latest updates you've made...
-
reporter Here's the log from running the steps I described in my comment 2 days ago. (BTW, looks like I misspoke in that issue - it's clustering rows only - I'll edit it.)
Running on a Mac. Checking if preferences exist for the new file. Loading with info from existing node. Found COMPLEX from row label types. Found GWEIGHT from row label types. Resetting model. Adding data to model... Calculating mean. Calculating median. Truncating sorted data array. Setting base values. Done parsing for CDT-format. No ATR file found for this CDT file. No GTR file found for this CDT file. Resetting MapContainers and DendroView components. New ColorSet: Red-Green New ColorSet: Yellow-Blue Registering Plugin Dendrogram ColorSet could not be returned because no Preferences node was defined. Returned default Red-Green. Returning default ColorSet at 0 Warning: Encountered invalid/negative firstVisible value: [-1]. Resetting. Warning: Encountered invalid/too-small numVisible value: [0]. Resetting. Warning: Encountered invalid/negative firstVisible value: [-1]. Resetting. Warning: Encountered invalid/too-small numVisible value: [0]. Resetting. Importing labels... Importing color settings... Found Existing node in MRU list for /Users/rleach/PROJECT/TREEVIEW/TestData/small_133x133.txt Creating subNode File1478188752713 Creating new fileset /Users/rleach/PROJECT/TREEVIEW/TestData/small_133x133.txt Restoring components states. Successfully loaded: /Users/rleach/PROJECT/TREEVIEW/TestData/small_133x133.txt Setting pBar max: 266 Initializing DistMatrixCalculator. DistTask is done: success. Done./Users/rleach/PROJECT/TREEVIEW/TestData/small_133x133/average/small_133x133_average_57.gtr Done./Users/rleach/PROJECT/TREEVIEW/TestData/small_133x133/average/small_133x133_average_57.gtr ProcessorClusterTask is done: success. ClusterTask is done: success. Done./Users/rleach/PROJECT/TREEVIEW/TestData/small_133x133/average/small_133x133_average_57.cdt Success! The row tree file was found. No file found for column trees. But old column tree file was found! Getting preferences for transfer to clustered file. Loading with info from existing node. Found COMPLEX from row label types. Found GWEIGHT from row label types. Found COLUMN LABELS 2 from col label types. Data start coordinates have shifted because more label types were added. Resetting model. SaveTask is done: success. Adding data to model... Calculating mean. Calculating median. Truncating sorted data array. Setting base values. Done parsing for CDT-format. Resetting MapContainers and DendroView components. Returning default ColorSet at 0 Identifier ARRY127X from tree file not found in CDT. - edu.stanford.genetics.treeview.plugin.dendroview.TreeDrawer.setData(TreeDrawer.java:287) - edu.stanford.genetics.treeview.plugin.dendroview.TreePainter.setData(TreePainter.java:1) - Controllers.DendroController.bindTrees(DendroController.java:1295) - Controllers.DendroController.bindComponentFunctions(DendroController.java:843) - Controllers.DendroController.setNewMatrix(DendroController.java:184) - Controllers.TVController.finishLoading(TVController.java:341) - edu.stanford.genetics.treeview.model.ModelLoader.done(ModelLoader.java:146) - javax.swing.SwingWorker$5.run(SwingWorker.java:737) - javax.swing.SwingWorker$DoSubmitAccumulativeRunnable.run(SwingWorker.java:832) - sun.swing.AccumulativeRunnable.run(AccumulativeRunnable.java:112) - javax.swing.SwingWorker$DoSubmitAccumulativeRunnable.actionPerformed(SwingWorker.java:842) - javax.swing.Timer.fireActionPerformed(Timer.java:313) - javax.swing.Timer$DoPostEvent.run(Timer.java:245) - java.awt.event.InvocationEvent.dispatch(InvocationEvent.java:311) - java.awt.EventQueue.dispatchEventImpl(EventQueue.java:744) - java.awt.EventQueue.access$400(EventQueue.java:97) - java.awt.EventQueue$3.run(EventQueue.java:697) - java.awt.EventQueue$3.run(EventQueue.java:691) - java.security.AccessController.doPrivileged(Native Method) - java.security.ProtectionDomain$1.doIntersectionPrivilege(ProtectionDomain.java:75) - java.awt.EventQueue.dispatchEvent(EventQueue.java:714) - java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:201) - java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:116) - java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:105) - java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101) - java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93) - java.awt.EventDispatchThread.run(EventDispatchThread.java:82) Importing labels... Importing color settings... Creating subNode File1478188783338 Creating new fileset /Users/rleach/PROJECT/TREEVIEW/TestData/small_133x133/average/small_133x133_average_57.cdt Restoring components states. Successfully loaded: /Users/rleach/PROJECT/TREEVIEW/TestData/small_133x133/average/small_133x133_average_57.cdt
And here's a listing of my average directory:
clustercolsthenclusterboth-badrowtreeresult.zip forPR94issue1.zip small_133x133_average_1.atr small_133x133_average_1.cdt small_133x133_average_1.gtr small_133x133_average_2.atr small_133x133_average_2.cdt small_133x133_average_2.gtr small_133x133_average_3.atr small_133x133_average_3.cdt small_133x133_average_3.gtr small_133x133_average_4.atr small_133x133_average_4.cdt small_133x133_average_4.gtr small_133x133_average_5.atr small_133x133_average_5.cdt small_133x133_average_5.gtr small_133x133_average_6.atr small_133x133_average_6.cdt small_133x133_average_6.gtr small_133x133_average_7.atr small_133x133_average_7.cdt small_133x133_average_7.gtr small_133x133_average_8.atr small_133x133_average_8.cdt small_133x133_average_8.gtr small_133x133_average_9.atr small_133x133_average_9.cdt small_133x133_average_9.gtr small_133x133_average_10.atr small_133x133_average_10.cdt small_133x133_average_10.gtr small_133x133_average_11.atr small_133x133_average_11.cdt small_133x133_average_11.gtr small_133x133_average_12.atr small_133x133_average_12.cdt small_133x133_average_12.gtr small_133x133_average_13.atr small_133x133_average_13.cdt small_133x133_average_13.gtr small_133x133_average_14.atr small_133x133_average_14.cdt small_133x133_average_14.gtr small_133x133_average_15.atr small_133x133_average_15.cdt small_133x133_average_15.gtr small_133x133_average_16.atr small_133x133_average_16.cdt small_133x133_average_16.gtr small_133x133_average_17.atr small_133x133_average_17.cdt small_133x133_average_17.gtr small_133x133_average_18.atr small_133x133_average_18.cdt small_133x133_average_18.gtr small_133x133_average_19.atr small_133x133_average_19.cdt small_133x133_average_19.gtr small_133x133_average_20.atr small_133x133_average_20.cdt small_133x133_average_20.gtr small_133x133_average_21.atr small_133x133_average_21.cdt small_133x133_average_21.gtr small_133x133_average_22.atr small_133x133_average_22.cdt small_133x133_average_22.gtr small_133x133_average_23.atr small_133x133_average_23.cdt small_133x133_average_23.gtr small_133x133_average_24.atr small_133x133_average_24.cdt small_133x133_average_24.gtr small_133x133_average_25.cdt small_133x133_average_25.gtr small_133x133_average_26.atr small_133x133_average_26.cdt small_133x133_average_27.cdt small_133x133_average_27.gtr small_133x133_average_28.atr small_133x133_average_28.cdt small_133x133_average_29.atr small_133x133_average_29.cdt small_133x133_average_29.gtr small_133x133_average_30.atr small_133x133_average_30.cdt small_133x133_average_31.atr small_133x133_average_31.cdt small_133x133_average_31.gtr small_133x133_average_32.atr small_133x133_average_32.cdt small_133x133_average_32.gtr small_133x133_average_33.atr small_133x133_average_33.cdt small_133x133_average_33.gtr small_133x133_average_34.atr small_133x133_average_34.cdt small_133x133_average_34.gtr small_133x133_average_35.atr small_133x133_average_35.cdt small_133x133_average_35.gtr small_133x133_average_36.atr small_133x133_average_36.cdt small_133x133_average_36.gtr small_133x133_average_37.atr small_133x133_average_37.cdt small_133x133_average_38.atr small_133x133_average_38.cdt small_133x133_average_38.gtr small_133x133_average_39.atr small_133x133_average_39.cdt small_133x133_average_39.gtr small_133x133_average_40.atr small_133x133_average_40.cdt small_133x133_average_40.gtr small_133x133_average_41.atr small_133x133_average_41.cdt small_133x133_average_41.gtr small_133x133_average_41.gtr.new small_133x133_average_42.atr small_133x133_average_42.cdt small_133x133_average_42.gtr small_133x133_average_43.atr small_133x133_average_43.cdt small_133x133_average_43.gtr small_133x133_average_44.atr small_133x133_average_44.cdt small_133x133_average_44.gtr small_133x133_average_45.atr small_133x133_average_45.cdt small_133x133_average_45.gtr small_133x133_average_46.atr small_133x133_average_46.cdt small_133x133_average_46.gtr small_133x133_average_47.atr small_133x133_average_47.cdt small_133x133_average_47.gtr small_133x133_average_48.atr small_133x133_average_48.cdt small_133x133_average_48.cdt.PNG small_133x133_average_48.gtr small_133x133_average_49.atr small_133x133_average_49.cdt small_133x133_average_49.gtr small_133x133_average_50.atr small_133x133_average_50.cdt small_133x133_average_50.gtr small_133x133_average_51.atr small_133x133_average_51.cdt small_133x133_average_51.gtr small_133x133_average_52.atr small_133x133_average_52.cdt small_133x133_average_52.gtr small_133x133_average_53.atr small_133x133_average_53.cdt small_133x133_average_53.gtr small_133x133_average_54.atr small_133x133_average_54.cdt small_133x133_average_54.gtr small_133x133_average_55.atr small_133x133_average_55.cdt small_133x133_average_55.gtr small_133x133_average_56.atr small_133x133_average_56.cdt small_133x133_average_56.gtr small_133x133_average_57.atr small_133x133_average_57.cdt small_133x133_average_57.gtr small_133x133_average.atr small_133x133_average.cdt small_133x133_average.gtr small_133x133-treeview.PNG
I also have a
complete
directory next toaverage
. -
reporter And here's the additional log messages when clustering using the steps from the issue description (clustering columns only):
Setting pBar max: 266 Initializing DistMatrixCalculator. DistTask is done: success. Done./Users/rleach/PROJECT/TREEVIEW/TestData/small_133x133/average/small_133x133_average_58.atr Done./Users/rleach/PROJECT/TREEVIEW/TestData/small_133x133/average/small_133x133_average_58.atr ProcessorClusterTask is done: success. ClusterTask is done: success. Saving did not finish successfully. Determined dir: /Users/rleach/PROJECT/TREEVIEW/TestData/small_133x133/average Attempted delete of small_133x133_average_58.cdt small_133x133_average_58.cdt was successfully deleted. Attempted delete of small_133x133_average_58.atr small_133x133_average_58.atr was successfully deleted. Directory average still has 173 files. average could not be deleted.
Oh yeah - I ran it on small_133x133.txt instead of the 6k file. Shorter test, same behavior.
-
repo owner Since the 133x133 test is very short, can you pull the branch and just run the clustering output again? I added a lot of logging statements which don't appear in your recent out put. What happens when you clear the directory (if you want to keep the files just temporarily move the average folder)?
-
reporter As I mentioned above...
I just tried to switch to the branch and pull, but I'm getting an error when I pull:
I'll add the log messages, but I'm not sure you'll get any from the latest updates you've made...
-
reporter OK, pulling worked this time. I'll post results in a moment.
-
reporter Here's the result of clustering columns only:
Running on a Mac. Checking if preferences exist for the new file. Loading with info from existing node. Resetting model. Assigning loaded data to model... Parsing for CDT-format... Parsing label types. Setting up row labels for the model. Setting up column labels for the model. Adding data to model... Truncating sorted data array. Done parsing for CDT-format. No ATR file found for this CDT file. No GTR file found for this CDT file. Setting model config data: User Preference Node: /TreeViewApp/TreeViewFrame/File/Model4 Resetting MapContainers and DendroView components. New ColorSet: Red-Green New ColorSet: Yellow-Blue Registering Plugin Dendrogram ColorSet could not be returned because no Preferences node was defined. Returned default Red-Green. Returning default ColorSet at 0 Warning: Encountered invalid/negative firstVisible value: [-1]. Resetting. Warning: Encountered invalid/too-small numVisible value: [0]. Resetting. Warning: Encountered invalid/negative firstVisible value: [-1]. Resetting. Warning: Encountered invalid/too-small numVisible value: [0]. Resetting. Importing labels... Importing color settings... Found Existing node in MRU list for /Users/rleach/PROJECT/TREEVIEW/TestData/small_133x133.txt Creating subNode File1478273812310 Creating new fileset /Users/rleach/PROJECT/TREEVIEW/TestData/small_133x133.txt Restoring components states. Successfully loaded: /Users/rleach/PROJECT/TREEVIEW/TestData/small_133x133.txt Clustering ready. Beginning task... Performing hierarchical cluster. Checking if cluster needs to be reaffirmed. Should cluster axes? (row, col): [false, true] Finished setup of directory structure. Common path for cluster files: /Users/rleach/PROJECT/TREEVIEW/TestData/small_133x133/average/small_133x133_average Generated ATR file (empty): small_133x133_average_59.atr Initializing DistMatrixCalculator (2) DistTask is done: success. Done. Closing writer for small_133x133_average_59.atr Got reordered 133 labels for column Reordering complete. Post-cluster reordering valid? true Generated CDT file: small_133x133_average_59.cdt Done. Closing writer for small_133x133_average_59.atr Data reordering is done: success. Post-cluster reordering valid? true ClusterTask is done: success. Setting up ClusterFileGenerator. Writing CDT cluster file... 0 - Cluster.ClusterFileGenerator.createHierCDT(ClusterFileGenerator.java:394) - Cluster.ClusterFileGenerator.generateCDT(ClusterFileGenerator.java:122) - Controllers.ClusterDialogController$SaveTask.doInBackground(ClusterDialogController.java:666) - Controllers.ClusterDialogController$SaveTask.doInBackground(ClusterDialogController.java:1) - javax.swing.SwingWorker$1.call(SwingWorker.java:295) - java.util.concurrent.FutureTask.run(FutureTask.java:266) - javax.swing.SwingWorker.run(SwingWorker.java:334) - java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) - java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) - java.lang.Thread.run(Thread.java:745) Error when writing the CDT file. Cancelling. Saving did not finish successfully. Determined dir: /Users/rleach/PROJECT/TREEVIEW/TestData/small_133x133/average Attempting delete of cluster files in /Users/rleach/PROJECT/TREEVIEW/TestData/small_133x133/average Attempted delete of small_133x133_average_59.cdt small_133x133_average_59.cdt was successfully deleted. Attempted delete of small_133x133_average_59.atr small_133x133_average_59.atr was successfully deleted. Directory average still has 176 files. average could not be deleted.
And from clustering rows only:
Clustering ready. Beginning task... Performing hierarchical cluster. Checking if cluster needs to be reaffirmed. Should cluster axes? (row, col): [true, false] Finished setup of directory structure. Common path for cluster files: /Users/rleach/PROJECT/TREEVIEW/TestData/small_133x133/average/small_133x133_average Generated GTR file (empty): small_133x133_average_59.gtr Initializing DistMatrixCalculator (1) DistTask is done: success. Done. Closing writer for small_133x133_average_59.gtr Got reordered 133 labels for row Reordering complete. Post-cluster reordering valid? true Generated CDT file: small_133x133_average_59.cdt Done. Closing writer for small_133x133_average_59.gtr Data reordering is done: success. Post-cluster reordering valid? true ClusterTask is done: success. Setting up ClusterFileGenerator. Writing CDT cluster file... Done. Closing writer for small_133x133_average_59.cdt Checking if row tree file is present. Success! The row tree file was found. Checking if column tree file is present. No file found for column trees. But old column tree file was found! Getting preferences for transfer to clustered file. Loading with info from existing node. Resetting model. SaveTask is done: success. Assigning loaded data to model... Parsing for CDT-format... Parsing label types. Setting up row labels for the model. Setting up column labels for the model. Adding data to model... Truncating sorted data array. Done parsing for CDT-format. Setting model config data: User Preference Node: /TreeViewApp/TreeViewFrame/File/Model30 Resetting MapContainers and DendroView components. Returning default ColorSet at 0 Identifier ARRY127X from tree file not found in CDT. - edu.stanford.genetics.treeview.plugin.dendroview.TreeDrawer.setData(TreeDrawer.java:287) - edu.stanford.genetics.treeview.plugin.dendroview.TreePainter.setData(TreePainter.java:1) - Controllers.DendroController.bindTrees(DendroController.java:1295) - Controllers.DendroController.bindComponentFunctions(DendroController.java:843) - Controllers.DendroController.setNewMatrix(DendroController.java:184) - Controllers.TVController.finishLoading(TVController.java:342) - edu.stanford.genetics.treeview.model.ModelLoader.done(ModelLoader.java:149) - javax.swing.SwingWorker$5.run(SwingWorker.java:737) - javax.swing.SwingWorker$DoSubmitAccumulativeRunnable.run(SwingWorker.java:832) - sun.swing.AccumulativeRunnable.run(AccumulativeRunnable.java:112) - javax.swing.SwingWorker$DoSubmitAccumulativeRunnable.actionPerformed(SwingWorker.java:842) - javax.swing.Timer.fireActionPerformed(Timer.java:313) - javax.swing.Timer$DoPostEvent.run(Timer.java:245) - java.awt.event.InvocationEvent.dispatch(InvocationEvent.java:311) - java.awt.EventQueue.dispatchEventImpl(EventQueue.java:744) - java.awt.EventQueue.access$400(EventQueue.java:97) - java.awt.EventQueue$3.run(EventQueue.java:697) - java.awt.EventQueue$3.run(EventQueue.java:691) - java.security.AccessController.doPrivileged(Native Method) - java.security.ProtectionDomain$1.doIntersectionPrivilege(ProtectionDomain.java:75) - java.awt.EventQueue.dispatchEvent(EventQueue.java:714) - java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:201) - java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:116) - java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:105) - java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101) - java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93) - java.awt.EventDispatchThread.run(EventDispatchThread.java:82) Importing labels... Importing color settings... Creating subNode File1478273923664 Creating new fileset /Users/rleach/PROJECT/TREEVIEW/TestData/small_133x133/average/small_133x133_average_59.cdt Restoring components states. Successfully loaded: /Users/rleach/PROJECT/TREEVIEW/TestData/small_133x133/average/small_133x133_average_59.cdt
There are new warning/error dialogs that come up too. Let me know if you want screen caps.
-
repo owner Thanks for the output, that helps. Can you zip up and send me the last couple of cluster file groups (atr, cdt, gtr between 50 and 59)? Or the entire folder if that's more convenient.
-
reporter I just emailed you the bottom of the directory
-
repo owner The output for clustering rows only is actually correct and how it should be. In your case you are trying to open a file which was somehow clustered before by TreeView3 (either or both axes), otherwise the log lines
Checking if column tree file is present. No file found for column trees. But old column tree file was found!
could not be present in the log because
ClusterTask.reaffirmClusterChoice()
has setClusterData.isAxisClustered
to true.TreeView3 clustered files use tree identifiers of the format ROW12X or COL12X whereas files from Cluster 3.0 use the GENE/ARRY scheme which we have abandoned. The row-clustering error you posted displays a mismatch between cdt and column tree files based on identifier names.
To handle it more gracefully, I have added a check to file loading which tests the compatibility of tree files and matrix, warning if they are incompatible. They are considered incompatible if the identifiers between tree files and matrix do not match. Matrices fully clustered in Cluster 3.0 can still be loaded fine in TreeView3 since tree identifiers are the same in atr/gtr and cdt files. No random identifier formats other than ROW/COL or GENE/ARRY are allowed.
This way of handling makes sense because in my opinion there is near zero reason to cluster a matrix in TreeView and then replace the tree files with other tree files from Cluster 3.0 or vice versa.
I don't know how that tree file got into your directory with the same name as the cdt-file (copy-paste?). But the mismatch is responsible for your second error and TreeView3 did not handle it in a user friendly manner.
Could you pull this branch again and attempt another run and post the log messages? The first issue never occurs for me, no matter what. I pulled master branch updates in which corrected older cdt file writing issues on this branch. Maybe it helps.
And last, you may notice an auto-detect label issue. In response I have created BB issue #503 to keep unrelated changes separate and explained the underlying problem.
-
reporter Here's the latest console output when performing the procedure described in the issue (except with the small_133x133.txt input file):
Running on a Mac. Saving window dimensions & position. Checking if preferences exist for the new file. Loading with info from existing node. Resetting model... Adding data to model... Truncating sorted data array. Done parsing for CDT-format. No ATR file found for this CDT file. No GTR file found for this CDT file. Resetting MapContainers and DendroView components. New ColorSet: Red-Green New ColorSet: Yellow-Blue Registering Plugin Dendrogram ColorSet could not be returned because no Preferences node was defined. Returned default Red-Green. Returning default ColorSet at 0 Last active set: Red-Green Warning: Encountered invalid/negative firstVisible value: [-1]. Resetting. Warning: Encountered invalid/too-small numVisible value: [0]. Resetting. Warning: Encountered invalid/negative firstVisible value: [-1]. Resetting. Warning: Encountered invalid/too-small numVisible value: [0]. Resetting. Importing labels... Importing color settings... Saving to ColorSet Custom Last active set: Red-Green Found Existing node in MRU list for /Users/rleach/PROJECT/TREEVIEW/TestData/small_133x133.txt Creating subNode File1479227738221 Creating new fileset /Users/rleach/PROJECT/TREEVIEW/TestData/small_133x133.txt Restoring components states. Successfully loaded: /Users/rleach/PROJECT/TREEVIEW/TestData/small_133x133.txt Clustering ready. Beginning task... Performing hierarchical cluster. Checking if cluster needs to be reaffirmed. Should cluster axes? (row, col): [false, true] Finished setup of directory structure. Common path for cluster files: /Users/rleach/PROJECT/TREEVIEW/TestData/small_133x133/average/small_133x133_average Generated ATR file (empty): small_133x133_average_60.atr Initializing DistMatrixCalculator (2) DistTask is done: success. Done. Closing writer for small_133x133_average_60.atr Got reordered 133 labels for column Reordering complete. Post-cluster reordering valid? true Generated CDT file: small_133x133_average_60.cdt Done. Closing writer for small_133x133_average_60.atr Data reordering is done: success. Post-cluster reordering valid? true ClusterTask is done: success. Setting up ClusterFileGenerator. Writing CDT cluster file... The array of ordered row tree identifiers does not match the amount of rows in the original data matrix. An error occurred when writing the clustered matrix file (cdt). null Done. Closing writer for small_133x133_average_60.cdt Checking if row tree file is present. No file found for row trees. But old row tree file was found! Checking if column tree file is present. Success! The column tree file was found. Getting preferences for transfer to clustered file. Loading with info from existing node. Data start coordinates have shifted because more label types were added. Resetting model... SaveTask is done: success. Alert: No numeric data could be found in the input file. The input file must contain tab-delimited numeric values.
Also, I got this error dialog:
-
reporter I have been looking into this issue further today and regarding your comment @TreeView3Dev that "The output for clustering rows only is actually correct and how it should be."... I believe that it is not how it should be. Here's why. The user has opened a completely unclustered file and has chosen to leave either the columns or the rows unchanged. Yes, they have previously clustered before, but what they are looking at is completely unclustered (as they have opened a ".txt" file). Whether or not they encounter this specific issue with the GENE#X versus ROW#X IDs, the code should not be retrieving and parsing the previously clustered data for the columns or rows which they selected to "leave unchanged". Its the ID issue which has brought this bug to light however, so it's a good thing it did.
Incidentally, I had put a try/catch around the specific lines of code that were causing the saving of the clustered file to fail and by simply ignoring the exception, you end up with something that you would somewhat expect to get: only one dimension clustered. You do get an error about the IDs being out of sync and the label types in the label settings dialog is out of synch, but at least the clustered is somewhat how you might expect it. I'll need to look at it more closely to be sure, but regardless, the order of the "unchanged" dimension should be retained as it is in the file the user is viewing.
-
reporter - changed milestone to C/R - 01
-
assigned issue to
As per our last meeting, I am taking up the mantle of this issue.
-
reporter I have determined a bit more about this issue, which has corrected an assumption I was making before. I was trying to determine which tree file was being associated with the unclustered small_133x133.txt file. I had assumed it was grabbing the most recent tree files from the small_133x133 directory, but it is not. It is grabbing tree files in the same directory that were generated when small_133x133.cdt was created. I don't recall how those files were initially created. I suspect I got them from Cluster.app when I was using treeview 2 long ago. The two files (.txt versus .cdt) sit next to each other. Even though I opened the .txt file, which has no real tree files associated with it, the code believes that the similarly named .atr and .gtr files go with it (and therefore that it is clustered) even though those files were generated when the .cdt file was created.
The .txt unclustered original file doesn't have a GID column or AID row, so when the code tries to associate the gtr file, it cannot do it and thus the failure.
What probably should be happening is the .txt file should be checked for IDs that match the IDs in the tree files. If they don't have a matching row/col of IDs, they shouldn't be associated. Or, an easier check would be to see if the extension of the opened file was '.cdt'. But then that could miss tree files generated for files with different extensions, such as .pcl. So it's probably best to simply do the ID check.
-
reporter I believe I've fixed the crux of this issue by changing the assumption (that if either a GID/AID exists or the .gtr/.atr file exists, it means that the file has been clustered) TO requiring both a GID/AID to exist (in the matrix file) and for the .gtr/.atr file to exist before assuming that the file is clustered. This seems at first glance to completely fix this issue.
-
reporter - changed title to Clustering columns only fails
-
reporter - changed status to resolved
Fixed on branch issue475
-
reporter Related to issue
#539. -
reporter - changed status to closed
Merged to master
-
reporter - changed version to beta2
- Log in to comment
@hepcat72 I need the 6kx6k file again. I think I lost it when my PC got fried. It wasn't in the backup files for some reason... I created another 6kx6k and it finished fine under the mentioned conditions, so it would be great to have that specific file.