Wiki

Clone wiki

jqcML / Conversion

Convert QuaMeter metrics to qcML

jqcML is highly suitable to easily convert metrics originating from various tools to the qcML format. As a realistic use case, the conversion from metrics generated by QuaMeter will be outlined here.

QuaMeter generates QC metrics in a distinct tab-separated (tsv) file for each experiment (represented by an mzML file).

// read the tsv file
BufferedReader tsvReader = new BufferedReader(new FileReader(metricsFile));
// each tsv file contains header information on the first line
// and the value of the metrics on the second line
String[] headers = tsvReader.readLine().split("\\t", -1);
String[] values = tsvReader.readLine().split("\\t", -1);

To store these metrics in a qcML file, we first create a new QcML object, to which the metrics will be added. However, before adding any metrics, entries for relevant controlled vocabularies should be added to the QcML object first. This is very important: metrics are only valid in the qcML format if they contain a reference to a controlled vocabulary that defines the metric in question. Therefore metrics can only be added to a QcML object if this QcML already contains an entry for the controlled vocabulary referenced by the metric. Hence the controlled vocabularies should be added first.

// create a qcML object
QcML qcml = new QcML();

// add references to the required controlled vocabularies (CV's)
Cv cvMS = new Cv("PSI-MS", "http://psidev.cvs.sourceforge.net/viewvc/psidev/psi/psi-ms/mzML/controlledVocabulary/psi-ms.obo", "MS");
qcml.addCv(cvMS);
Cv cvQC = new Cv("MS-QC", "http://code.google.com/p/qcml/source/browse/trunk/cv/qc-cv.obo", "QC");
qcml.addCv(cvQC);

Now that the controlled vocabularies are added to the QcML object, the quality metrics can all be added. However, the metrics aren't added directly to the QcML object, but instead to a QualityAssessment object representing a run quality. This is because a single QcML object can contain information for several different experiments. (However, in this example each qcML file will only contain metrics for a single experiment.)

// add a QualityAssessment to store the metrics
String baseName = values[0].substring(values[0].lastIndexOf('/') + 1, values[0].indexOf(".mzML"));
QualityAssessment runQuality = new QualityAssessment("run_" + baseName);
qcml.addRunQuality(runQuality);

Now we simply need to add all the metrics stored in the tsv file generated by QuaMeter to the new qcML file. Each entry in the tsv file is first converted to a QualityParameter object, after which it is added to the recently created QualityAssessment.

// add each individual quality metric
for(int i = 0; i < headers.length; i++) {
    // create a new quality parameter identified by the header information
    QualityParameter param = new QualityParameter(headers[i], cvQC, "param_" + headers[i] + "_run_" + baseName);
    // set the value calculated by QuaMeter
    param.setValue(values[i]);

    // parameters are identified by accession numbers in the CV
    // retrieve a valid entry in the CV for the current parameter based on the metric name
    String accession = cvMapping.get(param.getName());
    if(accession != null) {
        param.setAccession(accession);
        param.setCvRef(qcml.getCv(accession.substring(0, accession.indexOf(':'))));
    }
    else    // use the name as a dummy accession number
        param.setAccession(param.getName());

    runQuality.addQualityParameter(param);
}

During the construction of a QualityParameter object a reference to the relevant Cv object representing a controlled vocabulary is provided. In addition to the reference to the controlled vocabulary, an accession number identifying the metric in this controlled vocabulary is required. The QC controlled vocabulary contains already a few metric definitions, however, it doesn't include all metrics that can be generated by QuaMeter yet. For the existing definitions a mapping between the parameter name from QuaMeter and an entry in a controlled vocabulary has to be constructed. Based on this mapping these metrics can be identified be a controlled vocabulary. For the metrics that aren't included in the QC controlled vocabulary yet, the metric name is currently used as a dummy accession number, while still refering to the QC controlled vocabulary. As a result, the constructed qcML file will be syntactically valid, however not semantically. As the qcML format matures, additional entries will be added to its controlled vocabulary (also in collaboration with the community) in order to unambiguously define more metrics.

Finally the newly created QcML object has to be saved to a file.

// save the qcML object to an XML-based qcML file
QcMLFileWriter writer = new QcMLFileWriter();
qcml.setFileName(baseName + ".qcML");
writer.writeQcML(qcml);

Converting existing metrics generated by different tools, such as QuaMeter, to the qcML format is greatly facilitated by using jqcML; metrics can very intuitively be added to a qcML container. In addition, jqcML enforces several requirements in order to create valid qcML files.

Finally, jqcML can also be used to merge QC metrics generated by different tools for the same experiment. This can be done equivalently to the above described method. An extended example that merges metrics generated by QuaMeter (in the tab-separated file format) and OpenMS (in the qcML format) in a single qcML file is available here.

Updated