Feature request: Write @PG flag in SAM output

Issue #54 resolved
Jakob Nissen created an issue

Only if it’s not too much trouble. With BWA it’s nice that you can see the BWA version and even the command that created the file from within the file.

It would be nice if KMA could do this also.

Comments (4)

  1. ptlcc

    Hi Jakob

    That should be possible, we have added something similar in the map stat files. Do you have a snippet from BWA to see exactly how it should be formatted.

    Best,
    Philip

  2. Jakob Nissen reporter

    Yep, here is an extract from a BAM file. It’s tab-separated, but my terminal replaces it with spaces:

    @PG     ID:samtools     PN:samtools     VN:1.13 CL:samtools view -bS /tmp/jakobnissen/IRMAv1.0.2/0_25.rv.
    fq-cDVHy28nlJ18ffSlSyp2HpbDefm2qpos/A_MP.sam
    

    Here is the description from the SAM specification - the asterisk means the ID subfield is required, the others are optional. But version and CL is particularly useful!

    @PGProgram.
    ID* Program record identifier. Each @PGline must have a unique ID. The value of IDis used in the
    alignment PGtag and PPtags of other @PGlines. PGIDs may be modified when merging SAM
    files in order to handle collisions.
    PNProgram name
    CLCommand line. UTF-8 encoding may be used.
    PPPrevious @PG-ID. Must match another @PGheader’s IDtag. @PGrecords may be chained using PP
    tag, with the last record in the chain having no PPtag. This chain defines the order of programs
    that have been applied to the alignment. PPvalues may be modified when merging SAM files
    in order to handle collisions of PG IDs. The first PGrecord in a chain (i.e., the one referred to
    by the PGtag in a SAM record) describes the most recent program that operated on the SAM
    record. The next PGrecord in the chain describes the next most recent program that operated
    on the SAM record. The PG IDon a SAM record is not required to refer to the newest PGrecord
    in a chain. It may refer to any PGrecord in a chain, implying that the SAM record has been
    operated on by the program in that PGrecord, and the program(s) referred to via the PPtag.
    DSDescription. UTF-8 encoding may be used.
    VNProgram version

  3. ptlcc

    Hi Jakob

    I have added the feature to the version on the nano-branch, if you can confirm the update works appropriately I will merge it with the main-branch.

    Best,
    Philip

  4. Log in to comment