Add possibility to load data from a list, instead of a matrix

Issue #444 new
Anastasia Baryshnikova created an issue

USE CASE: WHAT DO YOU WANT TO DO?

Provide a more flexible experience for the user when loading a custom dataset

STEPS TO REPRODUCE AN ISSUE (OR TRIGGER A NEW FEATURE)

N/A

CURRENT BEHAVIOR

Currently, Treeview can only load matrices. One of the test users asked me if it is possible to load a dataset in a list format (label A - label B - value).

EXPECTED BEHAVIOR

Treeview should be able to automatically recognize the format (just as it recognizes the number of row/col headers now) and then present a confirmation window to the user to make sure the parsing was done successfully.

DEVELOPERS ONLY SECTION

SUGGESTED CHANGE (Pseudocode optional)

e.g. Add a color selection class

FILES AFFECTED (where the changes will be implemented) - developers only

e.g. selectColor.java & settingsPanel.java

LEVEL OF EFFORT - developers only

trivial/minor/medium/major/overhaul (choose one)

COMMENTS

Comments (23)

  1. Robert Leach

    Can you enter a procedure that shows that list data cannot be entered? I was able to successfully load list data using this file:

    https://bitbucket.org/TreeView3Dev/treeview3/downloads/mylabeltest5.txt

    The data is not real data. It's a sequential number. The default data detection seemed to sort of work - plus, I could change it:

    importopts.png

    Here's what it looks like when it's loaded and the interface appears to be fully functional:

    loadedlistdata.png

    Actually, the first time I tried to open it, the "Choose a file" dialog kept popping up after loading, but I couldn't make that happen again after the first time and I don't know which jar I was running when I tested it. There could be a real problem there, but without the ability to reproduce it, I just have to write it off as a fluke.

  2. Robert Leach

    Aha, I figure it out. It's the case where there are no headers. If you attempt to load a list that has no headers, you get this error:

    noheaderserror.png

    and the data will not load.

  3. Robert Leach

    I was talking to Lance about this and he says that there exists a "list file format" that has 3 columns consisting of row label, column label, and value. Of course, labels would be highly repetitive in this format. Now that I think about it, I believe I have encountered something like this in the past.

    Is this what you meant?

  4. Srikanth Bezawada

    I am working on this issue, I have one doubt. Will the first line of list format have headers ? If so, an example please.

  5. Anastasia Baryshnikova reporter

    @srikanthbezawada -- I think the rules should be:

    1. assume by default that the first row contains column headers
    2. in the preview, the user will adjust this setting; if the user deselects the first row as the header, auto-generate a header (e.g., "ROW_LABELS" and "COLUMN_LABELS")
    3. include a checkbox to enable the user to make the matrix symmetric (e.g., if only the upper triangle of an adjacency matrix is loaded, fill in the rest of the matrix with the same values)
  6. Robert Leach

    I think if there are only 3 columns, it's a good bet to default to list format parsing, but we might want some manual way to set the format type. Also note that each row has a single value for a single cell, thus missing cells need to be accounted for and treated as empty. There's no way to know what the final dimensions of the matrix will be in this format, so you will have to look for the longest row and longest column.

  7. Srikanth Bezawada

    @abarysh , Thanks for the pointers.

    I got one doubt written below.

    • Consider the following example.

    a b 3

    c d 5

    If user selects the checkbox symmetric,

    b a 3

    d c 5

    are also implicitly added to the view.

    • Consider the following list example.

    a b 3

    b a 2

    c d 5

    If user selects the checkbox symmetric, how should this case be handled ? (a-b and b-a have different values)

  8. Anastasia Baryshnikova reporter

    That's a great question. I think the most reasonable thing to do (without adding extra interface elements) is to average the ab and ba values. The checkbox that the user checks should say "Make symmetric" (rather than just "Symmetric" -- this way we emphasize that it is an action). If there's space next to the checkbox, we can also add a note "Values for reciprocal pairs A-B and B-A will be averaged". If there's no space, we might need to show it as a pop-up warning when the user clicks on the checkbox.

  9. Srikanth Bezawada

    @abarysh I am going with averaging the reciprocal pairs..

    Are there any test files of this format ? Also, can you please explain the role/usage of headers w.r.t list data after loading into treeview3.

  10. Anastasia Baryshnikova reporter

    There're no test files for this format yet, but you can create your own for the time being and then I can make another one when I test the feature. The headers of the 2 columns in the data file should be used as row/column labels. For example:

    Gene Condition Data

    abd1 drugA 0.5

    ... ... ...

    Gene and Condition will be row and column labels, respectively. In this particular case, averaging doesn't make sense (because rows and columns are different thing). In general, I think the "Make symmetric" options should be unchecked by default.

  11. Srikanth Bezawada

    The following changes have been implemented.

    If a given pair of labels have multiple values, all the values are averaged. For example,

    a b 3

    a b 2

    The value of a-b comes out to be 2.5.

    Another example,

    a b 3

    a b 2

    b a 5 and user selected symmetric, the value of a-b and b-a comes out to be 3.33.

  12. Log in to comment