missing UMLS definitions

Issue #57 on hold
b created an issue

Many of terms in the Implicitome tuples and in the B concepts are currently not attached to any definitions. Given a UMLS id, we should be able to provide a definition. This content needs to be pulled from the UMLS database and placed in the table used.

Note that the current deployed mysql has one database called umls that has one table in it called 'definition' with 156981 entries in it. This is a small subset of the umls..

Comments (15)

  1. Richard Bruskiewich

    wc -l MRDEF.RRF 188647 MRDEF.RRF

    This the UMLS data file I obtained from the licensed data set. I'm wondering what I did wrong in generating the definition file. I'll investigate, then attempt a reload.

  2. Richard Bruskiewich

    I've just fired up the UMLS installation again, and am reminded how complex the UMLS Metathesaurus configuration processes are. Not knowing what to include or exclude, I've just attempted to regenerate the UMLS meta-data set with everything selected. I'll check to see what impact this has on the MRDEF.RRF file size.

  3. Richard Bruskiewich

    The original UMLS definition file I loaded on the main server had the following line size:

    wc -l src/static/data/MRDEF.RRF 188647 src/static/data/MRDEF.RRF

    The one I regenerated over the past 24 hours by a less stringent configuration (in the mmys UMLS tool):

    wc -l /mnt/hgfs/Scripps_Institute_Ben_Good/MRDEF.RRF 183908 /mnt/hgfs/Scripps_Institute_Ben_Good/MRDEF.RRF

    Ironically, smaller by ~6000 entries... (??!?)

    The UMLS installation is not yet complete... taking a very long time. I'll let it run its course.

    Is there another place from which to get the UMLS definitions?

    I may try loading the current definitions again with my latests script, which has modifications similar to the ABC loading script, to make it perhaps more robust.

  4. Richard Bruskiewich

    I suspect that more UMLS definitions should make it into the database for a few reasons:

    1) I explicitly commit after each save/create (like the ABC loader), then reread the definition to check if it is there. 2) Some of the entries have extended Unicode (scientific) characters with a non-ascii codecs. I fixed this issue elsewhere in the system once before, but it cropped up again in the UMLS loading script (noticed the error anew) so I fixed the codec, and the odd entries were loading properly (in my local VM server test), so I'm rerunning the script on the production server.

    All that said, I don't know how many more definitions you would expect to see here, but I haven't yet been able to figure out where to get more from the complex UMLS Metathesaurus processing/installation. Perhaps if someone at Scripps has more experience working with the UMLS installation routine, they can try to generate a larger, more complete MRDEF.RRF file for loading, or some other equivalent UMLS concept definitions file (I can adjust the scripts accordingly, to parse another file, if it is not the MRDEF.RRF file I've been using).

  5. Richard Bruskiewich

    I can't do any more with this and am therefore considering it "done" for Implicitome release, unless somebody else can dredge up more UMLS definitions somewhere...

  6. b reporter

    I'm looking into this now. (installer running.. will take a few hours). One thing it sounds like you might have missed was the option to create the mysql load scripts at the beginning of running metamorphosys installer. Running things off of the generated database rather than the flat files might clear this up.

  7. Richard Bruskiewich

    The UMLS definitions are in the database but a bug has crept in which prevents their display (being investigated...)

  8. b reporter

    A minor adjustment to the problem here. There are a number of concepts that, like 'C1855923', do exist in the UMLS database records, but do not have textual definitions associated with them in the release... For these concepts, the best we can do is to provide the semantic types and perhaps, links to the entries in the source vocabularies and even more 'perhaps', a depiction of the hierarchical context of the term in the original source ontology (e.g. parent, sibling, child terms). Unsatisfying, but there you go..

  9. b reporter

    Note that on the 2015 release that I just processed, I ended up with 215476 entries in my definitions table. I selected the "active subset" configuration. Screen Shot 2015-06-28 at 9.19.08 PM.png

  10. b reporter

    My take on closing this issue is to (1) make sure we have as many definitions as we can loaded, (2) for those without definitions, show the semantic type, and an indication that the definition is missing from the UMLS.

  11. Richard Bruskiewich

    I've cleared up the UMLS definition display bug (t'was a stupid bug...). I've also now adapted the system to use the UMLS MRDEF.sql created table as a source of definition data (with slight modification of the primary key to one which is Django friendly).

    I've not yet dealt with suggestion (2) above... so this issue will remain open a little bit longer.

  12. Richard Bruskiewich

    I'd like to postpone the semtype annotation to a (slightly) later sprint, since some wider refactoring of the code might give the end user broader access to a semantic type meta-data enabled UI.

  13. Log in to comment