chemicaltagger /

Filename Size Date modified Message
428 B
122 B
11.1 KB
2.2 KB
8.8 KB
A. ChemicalTagger Components:

    This package is used for marking up experimental sections in chemistry papers:
    It has 3 main classes:
        I. ChemistryPOSTagger: 
        This class takes a sentence and runs it against three taggers:
             -OSCAR (for chemical entities)
             -Regex (for recognising chemistry related entities)
             -OpenNLP (for english parts of speech)
        II. ChemistrySentenceParser: 
           This class converts a tagged sentence into a parseTree. It uses a lexer and parser generated
        by the Antlr grammar.     
        III. ASTtoXML: 
        This class converts an abstract tree into an XML document.

B. Running chemicalTagger:

public void parseChemicalSentence(){

        String text = "A solution of 124C (7.0 g, 32.4 mmol) in concentrate H2SO4 (9.5 mL) was added to a solution of concentrate H2SO4 (9.5 mL) and fuming HNO3 (13 mL) and the mixture was heated at 60°C for 30 min. After cooling to room temperature, the reaction mixture was added to iced 6M solution of NaOH (150 mL) and neutralized to pH 6 with 1N NaOH solution. The reaction mixture was extracted with dichloromethane (4x100 mL). The combined organic phases were dried over Na2SO4, filtered and concentrated to give 124D as a solid."; 
        // Calling ChemistryPOSTagger
        POSContainer posContainer = ChemistryPOSTagger.getInstance()
        // Returns a string of TAG TOKEN format (e.g.: DT The NN cat VB sat IN on DT the NN matt)
        //  Call ChemistrySentenceParser either by passing the POSContainer or by InputStream
        ChemistrySentenceParser chemistrySentenceParser = new ChemistrySentenceParser(

	    // Create a ParseTree of the tagged input
		// Return an AST 
		Tree t = chemistrySentenceParser.getParseTree();
		// Return an XMLDoc
		Document doc = chemistrySentenceParser.getDocument();