A Java API for easily accessing the UMLS Metathesaurus and finding relationships between concepts, which underlyingly uses RDF triples. Metagraph provides a mechanism to convert the UMLS data into RDF triples and import them into a triple store (currently only Virtuoso is supported), after which the UMLS data can be queried via an API.


The UMLS Metathesaurus is an enormous repository of medical concepts integrating dozens of different medical ontologies. This data consists of a large number of links. In particular, there are parent-child relationships, reflecting the hierarchical nature of the ontologies, links from equivalent concepts to CUIs (concept unique identifiers) and links to UMLS semantic types. There are also many key-value style attributes attached to many concepts.

The official method to access the UMLS is using a relational database, but from the characteristics listed above, it's clear that the UMLS has many characteristics of linked data. So, it makes sense to store the data in an RDF triplestore, letting us take advantage of all the features that triplestores provide. The NCBO has already realised this and created the umls2rdf tool. Metagraph builds on that idea (and code, ported to Scala), and adds a Java API to access the data once it has been stored in the triplestore.

Finding a concept, then locating its parents, children, or equivalent source vocabulary terms shouldn't require several multiple line SQL queries. Common operations like this should be simple. The Metagraph API is intended to make many use cases be easier to handle for developers using the UMLS ontology. It is actually not necessary to know that the data is stored in a triple store underlyingly to query the UMLS (even if you do need to configure a triplestore server initially), although direct access to the triplestore is, of course, possible.

Unlike the conversion to SQL, the conversion from Rich Release Format to RDF triples is a lossy conversion. We do not guarantee that Metagraph can do everything that direct SQL calls can do, but we do attempt to cover the most common usages (such as reasoning using the output of Metamap) -- at least the ones we've needed in our work. If you think your use case is common but not handled by the lossy conversion, please let us know.

The code is fairly fast -- converting the RRF to RDF TTL data and loading it into Virtuoso should take less than half an hour on a reasonable machine (this is without exporting attributes).



From Source

Download the tip or a specific version, or get the source using Mercurial

Build from the project root dir using SBT:

$ sbt compile

Using Maven/SBT

If you wish to simply use Metagraph as a library in your project, it is simpler to use your build tool of choice to install it as a dependency.

For Maven add the following dependency to your pom.xml:


Or add this to your build.sbt if using SBT to build your project (The scala version must be 2.10)

libraryDependencies += "com.nicta" %% "metagraph" % "0.5.1"

(The version numbers above could be out of date. Please see the list of tags to check if there is a newer release)

It should be straightforward to work out what settings to use for other build tools -- usually you will want to append the _2.10 suffix as in the maven coordinates.

Getting Started

  1. Install Virtuoso directly or using your favourite distribution's package management system. It is not absolutely necessary to use Virtuoso -- it should be possible to use another Sesame-compatible triple store (In fact, by default, when you run the test suite, it uses an in-memory OpenRDF repository), but you'll be largely on your own.

  2. Download and unpack the UMLS RRF Data files to a locally accessible directory, preferably on the machine where the Virtuoso server is running (which will enable faster imports).

    • A command-line importer is provided at com.nicta.metagraph.umls.ImportMetathesaurus; you can run this by first generating a script to execute the classes, then running the main method with a -h flag to show the detailed help, including details about putting the typesafe-config-compatible config file somewhere on your classpath, and what configuration variables to set.

    $ sbt start-script

    $ ./target/run com.nicta.metagraph.umls.ImportMetathesaurus -h

    • You can also, of course, do the import programmatically from the initialisation of your own code, by calling the ImportMetathesaurus.run() method directly, or by using the code as a guide. (In the latter case it is possible, with a little more effort, to avoid the need for a typesafe-config configuration file if you are committed to another configuration method or something).
  3. That's it. You're ready to go. You can query by instantiating the com.nicta.metagraph.umls.Metathesaurus class. In particular, the .find* methods are likely to be good starting point. If you're running on the local machine, you should be able to re-use the same config file, although you can of course also use a different user if you need to, as long as they have read privileges for the appropriate graph (http://metagraph.nicta.com/ontology/umls, unless you have modified it). You can also remotely connect to the Virtuoso instance, if your server is configured to allow it, but you will of course need different connection parameters (If you've chosen to avoid the config files, you should already have worked out your own way of instantiating the Metathesaurus class).


BSD, except where otherwise noted. See the license file.


The software in its current state (0.x releases) should be considered beta-quality. It is being used for real projects but until it is more widely used, many bugs are likely to go undetected. In addition, there may be incompatible API changes between minor version numbers (e.g. from 0.6 to 0.7), although we will try to avoid these where possible and note these clearly, and after version 1.0, any changes between minor version numbers will be backwards-compatible.


Please use the Metagraph forum on Google Groups for questions, tips and advice.