Manual input of historical data

Issue #943 new
Alan Chamberlain created an issue

Hi

I see it is possible to export data to an Excel file, but is it possible to import data from an Excel, csv file or other manual method? I’m prepared to write and test this functionality, but need some guidance where to start.

Regards

Alan

Comments (22)

  1. Ed McDonagh

    Hi Alan

    We do have a function to import height and weight information from CSV, and in the distant past I’ve written custom scripts to import legacy data I had!

    So it is definitely possible, but would need to be developed as you proposed.

    What sort of data do you have for import - how complete is it? Do you think an import routine would lend itself to other people’s needs (ie be worthy of being in the general release rather than a local addition that’s only useful to your institution)? And it’s your source of data a one off import or something that’s refreshed periodically?

  2. Alan Chamberlain reporter

    Hi Ed

    The data varies from very complete to minimal, but I think we are looking at importing a minimal set of data necessary to generate DRLs as someone has to type the data in. We can possibly look to extend this later. Most of the data would be a once off import, but we do have a few machines that are old and don’t offer RSDR or are not on a network and we would be looking to import that data monthly.

    It is difficult to say if there are many other sites with historical data that they would need to migrate, but I believe that it would increase the acceptance of OpenREM if you could offer this upgrade path.

    If you have some legacy scripts that I could look at it will be a huge help. I did start looking at this over December when things were quite, but didn’t get very far.

    Regards

    Alan

  3. Alan Chamberlain reporter

    Hi Ed

    What is the minimal set of data needed to create adequate DRl’s? Below is a list to start with:

    Patient name
    Patient ID
    Institution name
    Equipment make
    Equipment model
    Study date
    Study type
    Procedure
    DAP or DLP reading

    Possible additional fields
    Physician name
    Operator name

    Please add or delete from this list as necessary.

    I’ve been looking at the ptsizecsv2db.py file and things are making a little more sense now.

    Regards

    Alan

  4. Ed McDonagh

    Would you have any data that has more than one exposure per study for radiography? How is this presented in your source data?

    For CT, do you have just study DLP? Or series CTDIvol/DLP data? How is this presented?

    I was thinking that you could have a web interface that allows the upload of a CSV file, then you could present all the column headers as drop-down menus to select against the possible database fields. Or vice-versa - present all the CSV file headers then choose which database fields they map to in the drop-down.

    If there is data for more than one exposure per study, then I think it would be best to present this one exposure per line, and use the Accession Number or some other identifier we could use as accession number to tie them together?

  5. Alan Chamberlain reporter

    Currently just one exposure per study. CT is recorded as DLP. Values are recorded manually on a form. This is transcribed as an Excel file with one row per study.

    A web interface for the import would be ideal. Currently though I’ll be happy just to get a command line import working. I’ve copied ptsizecsv2db.py and created a new module txt2db.py. I can specify the column headers as parameters and I am thinking of including a header file option that specifies the column headers as well.

    For the actual import I am thinking of creating DICOM rdsr object from each line of the csv dataset and handing that to _rdsr2db to import. It is a bit cumbersome but uses tested routines and offers potential for expansion. However, if you can think of a simpler way I’m happy to go with that.

  6. Ed McDonagh

    I would have thought creating a compliant [enough] RDSR would be quite difficult - much easier to just import directly to the database. My initial concern is whether there will be enough data for the web interface and exports not to fall over where there is an assumption that data will exist, and that any database table constraints are met.

  7. Ed McDonagh

    Via the command line I guess the header file option would need to map the columns in your data file to the database fields they correspond to

  8. Alan Chamberlain reporter

    I’m afraid I am not familiar with the structure of the database. I had a look at a couple of the functions in extract_common.py but they seemed to require dicom objects anyway. I used a pydicom tool called codify which creates the python code to recreate a dicom file. Now it is a task of filling in enough gaps that rdsr2db is happy. Which brings us back to the previous question; what is the minimal set of data needed to produce a viable database entry?

  9. Ed McDonagh

    Carrying on here, rather than in the PR @Alan Chamberlain . Have I pointed you to this diagram showing the database tables from 2014, before the CT tables were added (once they were added, it all got too cumbersome to understand!): https://bitbucket.org/openrem/openrem/src/develop/stuff/dose_tables.png

    Would you like me to create some starter for ten code to show what PR #523 could look like without the RDSR generation step?

    Either way, can you please redirect your PR from develop to issue943importFromText branch, so I can merge it in to take a look? Bitbucket doesn’t allow me to do much with it unless I check it out from your repo, or merge it into a branch in mine.

    Thanks for doing this!

  10. Alan Chamberlain reporter

    I’ve pointed the PR to issue943importFromText . I did see the tables under /stuff/. They are very impressive, but quite intimidating. I also interrogated the database with Squirrel-SQL to try and get an idea of the structure.

    If you could give me some demo code that will be a great help. I agree the rsdr model is not the most concise.

    Thanks

  11. Ed McDonagh

    I’ve started the direct to database alternative @Alan Chamberlain - just need to add the actual dose import. See if I can do that tomorrow if I get a chance. I expect it not to work, or to break lots of things, as I haven’t attempted to ensure everything that is mandatory or expected is there yet. Have a look at the branch issue943importFromText to see what I have done. I’ve mainly been cribbing from ct_philips.py as that is most similar to what we are trying to do.

    Regarding UID roots, you can see my two roots at openrem/remapp/netdicom/tools.py - if we need one we could agree a format to use for this.

  12. Alan Chamberlain reporter

    My apologies about the .gitignore. I added some files locally that I didn’t want git to track and didn’t want the modified .gitignore uploaded. Somehow managed to delete it from the branch instead.

    I’ll take a look at what you’ve done and at ct_philips.py.

  13. Alan Chamberlain reporter

    I have a basic framework running. Still needs lot of polishing and only handles CT, RF and DX. I’m not too sure about the DX as I don’t have any data. If you have a gap I’d appreciate if you could take a look and see if there are any issues. There is a lot of redundancy as I tried as much as possible to duplicate the structure of the other modules.

    Regards

    Alan

  14. Alan Chamberlain reporter

    Hi Ed

    Please will you confirm the dose units used internally by OpenREM.

    DAP appears to be Gy.m2. Is this correct?
    Dose I assume is Gy?
    DLP I can’t seem to find, is it mGy.cm, Gy.m etc?

    Regards

    Alan

  15. Ed McDonagh
    • DAP Gy.m2
    • RP Dose etc Gy
    • CT DLP mGy.cm
    • Mammo AGD mGy

    All as per the RDSR template definitions :)

  16. Alan Chamberlain reporter

    For future reference the linux command to upload mulitple csv files assuming they all have the same header is:

    find . -name "MyCSVfile*.csv" -exec python /path/to/openrem/openrem/scripts/openrem_txt.py {} -f My-headers.txt -v \;

    executed in the directory where the csv files are.

  17. Log in to comment