Wiki

Clone wiki

EncyclopeDIA / EncyclopeDIA File Formats

EncyclopeDIA File Formats

.DLIB (DDA library) and .ELIB (EncyclopeDIA library)

DLIB and ELIB files are SQLite databases that can be opened and browsed with the open source DB Browser for SQLite. DLIBs contain just the minimal subset of information in an ELIB required for library searching and are typically generated from DDA datasets. ELIBs contain additional quantification and chromatographic data in chromatogram libraries and quantification reports. These databases use the following schema:

elib layout.png

The entries, peptidelocalizations, peptidequants, and peptidescores tables all have unique entries based on the key: PrecursorCharge, PeptideModSeq, and SourceFile. These tables join with proteinscores using a many-to-many relationship through the peptidetoprotein table using the keys PeptideSeq and ProteinAccession. The retentiontimes and metadata tables contain additional metadata about the retention time fitting and parameters.

Notes

  • PTMS: PeptideModSeq has strings like "QKEC[+57.0214635]SDK" to indicate PTMs. PTMs are always encoded as delta masses (including fixed PTMs such as carbamidomethylation). Sites can only have one PTM mass, so compound masses are allowed, such as "M[+58.00548]ELS[+79.966331]C[+57.0214635]PGSR", where +58.00548 indicates both acetylation and oxidation. N- and C-terminus PTMs should be annotated on the first or last amino acid in the peptide, respectively. Metabolic labels can be incorporated in the same way, for example EC[+57.0214635]SDK[+8.014199].

  • Blobs: There are several blobs (e.g. byte arrays). All blobs are zlib (Zip) compressed arrays and are accompanied by a field indicating their uncompressed length. All arrays are stored using Big Endian byte ordering, which is standard for Java. Masses are encoded as double precision (MassArray and QuantIonMassArray) while all other arrays are encoded as float precision. For example, you can use the following code to extract arrays in Java:

public static double[] extractMassArray(byte[] compressedData, int uncompressedLength) throws IOException, DataFormatException {
    return toArray(decompress(compressedData, uncompressedLength), true);
}
public static double[] extractIntensityArray(byte[] compressedData, int uncompressedLength) throws IOException, DataFormatException {
    return toArray(decompress(compressedData, uncompressedLength), false);
}

private static byte[] decompress(byte[] compressedData, int uncompressedLength) throws IOException, DataFormatException {
    Inflater decompresser=new Inflater();
    decompresser.setInput(compressedData);
    byte[] decompressedData=new byte[uncompressedLength];
    decompresser.inflate(decompressedData);
    decompresser.end();
    return decompressedData;
}
private static double[] toArray(byte[] b, boolean isDouble) {
    int byteLength=isDouble?8:4; // use 8 for double, 4 for float
    double[] d=new double[b.length/byteLength];
    ByteBuffer bb=ByteBuffer.wrap(b);
    bb.order(ByteOrder.BIG_ENDIAN);
    DoubleBuffer db=bb.asDoubleBuffer();
    db.get(d);
    return d;
}

.DIA (DIA raw file)

DIA files are SQLite databases that contain a subset of the data found in normal RAW files or mzMLs, and are designed for fast queries for precursor isolation windows. These databases use the following schema:

dia layout.png

DIA files use the same blob structure as ELIBs and DLIBs: zlib compressed byte arrays, where masses are double encoded and other arrays are float encoded.

Updated