AssemblePairs.py align -1 ERR346600_2.fastq -2 ERR346600_1.fastq

Issue #79 resolved
Anton Kulaga created an issue

I am very confused by AssemblePairs parameters, I see that you give:

AssemblePairs.py align -1 ERR346600_2.fastq -2 ERR346600_1.fastq

what intuition tells me is that -1 should be first read and -2 should be the second one, while here you give second read to -1 and first read to -2. If it is not a bug but a feature, could you clarify it in the documentation?

Comments (3)

  1. Anton Kulaga reporter

    I checked, probably I was not attentive enough, as you mention “During assembly we have defined read 2 (V-region) as the head of the sequence (-1) and read 1 as the tail of the sequence (-2).” in the docs, however it is not clear if I should always have read 2 as -1 . In most of the other tools if it is -1 then it is always first read

  2. Jason Vander Heiden

    Greetings Aton,

    It actually shouldn’t matter which read is -1 and which is -2. For that particular example, read 2 just happens to start at the beginning of the V segment and be in the forward orientation (5' to 3' w.r.t. to the V(D)J reading frame). See the read configuration diagram here:

    https://presto.readthedocs.io/en/stable/workflows/Greiff2014_Workflow.html#read-configuration

    By default, AssemblePairs will take the reverse complement of the -2 sequence prior to stitching the reads (see the --rc argument). Meaning, for this read configuration, the stitched sequence will end up being oriented in the forward direction (V to J). If you flip the -1 and -2 arguments, then it’ll come out in the reverse complement orientation (J to V), by virtue of the library prep design. When you feed the reverse complemented sequences to IgBLAST, it’ll figure out they are reversed and flip them, giving you the same result.

    So the ERR “1” and “2” files were assigned to -2 and -1 just based on making it easier to look at the reads themselves - without having to deal with them being flipped. For a sequencing project with a different primer design, this might be setup differently.

    PS: A long time ago we actually had the arguments named head/tail instead of 1/2, and that’s still in the docs (run AssemblePairs.py align -h), but we changed it so that argument names were shorter consistent across tools.

  3. Log in to comment