1. NICTA Biomedical Informatics
  2. Untitled project
  3. Brat2BioC

Overview

HTTPS SSH
Brat2BioC converter v1.0
-----------------------

This tool allows converting the files annotated in the brat format to BioC and viceversa.
Brat2BioC is dependent on
(1) BioC, which provides resources to read and write BioC files. It is included in the lib folder (bioc.jar).
(2) brateval, which provides support to load and write annotations in the brat format. This must be installed separately.
(3) xstream, required by the BioC implementation. This gets installed automatically as a dependency when using maven.

Thus, the installation steps for Brat2BioC are:

1. Install brateval per the instructions at https://bitbucket.org/nicta_biomed/brateval

2. Download Brat2BioC (e.g. hg clone https://bitbucket.org/nicta_biomed/brat2bioc)

3. Build Brat2BioC using maven:

     mvn install

The jar file will be under the target folder. The name of the generated jar file contains the version of the software, e.g. BRATEval-0.0.1-SNAPSHOT.jar.
Change the name of the generated or downloaded jar files accordingly to run the examples below.
The lib folder contains the package bioc.jar required to run this tool.

Once the application is installed, the CLASSPATH needs to be include the following library jar files.

CLASSPATH=brat2bioc.jar:brateval.jar:bioc.jar:xstream-1.4.4.jar:xmpull-1.1.3.1.jar:xpp3_min-1.1.4c.jar

Example of java call to perform a conversion from a brat folder to BioC file:

java -cp $CLASSPATH au.com.nicta.csp.bbc.BRAT2BioC input_brat_files_folder output_bioc_file_name

input_brat_files_folder = folder with the brat files: document files (*.txt) and annotation files (*.ann, *.a1, *.a2) 
output_bioc_file_name = file to output the BioC generated content

Example of java call to perform a conversion from a BioC file to brat annotation files:

java -cp $CLASSPATH au.com.nicta.csp.bbc.BioC2BRAT input_bioc_file_name output_brat_files_folder

input_bioc_file_name = file name with BioC annotations
output_brat_files_folder = folder to output the brat generated content

It is possible to run the software directly using maven after installing it:

mvn install

From the installation directory, run the following command to convert from brat to BioC:

mvn exec:java -Dexec.mainClass=au.com.nicta.csp.bbc.BRAT2BioC -Dexec.args="input_brat_files_folder output_bioc_file_name"

From the installation directory, run the following command to convert from BioC to brat:

mvn exec:java -Dexec.mainClass=au.com.nicta.csp.bbc.BioC2BRAT -Dexec.args="input_bioc_file_name output_brat_files_folder"

The software has been used to produce results for the Variome corpus, BioNLP Shared GE task for the years 2009, 2011 and 2013.
It has been used as well to convert a large set of corpora from the WBI repository http://corpora.informatik.hu-berlin.de.
It has presented in the following publication:

Antonio Jimeno Yepes (antonio.jimeno@gmail.com), Mariana Neves, Karin Verspoor 
Brat2BioC: conversion tool between brat and BioC
BioCreative IV track 1 - BioC: The BioCreative Interoperability Initiative, 2013