Add unique identify to start of read names for downstream analysis

Issue #87 resolved
notrando created an issue

Hi there,

Thanks for creating presto, great tool and amazing ecosystem.

I’m trying to analyse the output of presto using the IMGT servers, but unfortunately they truncate the read names, therefore information is lost which affects downstream analysis.

There are a few options provided by presto which partially solve the issue. There’s the ParseHeaders.py subcommands add and rename. add will append to the end of the read, so unfortunately this doesn’t help and rename will add to the start with some minor issues (like adding NONE| for some odd reason) but both of these do not really solve the issue: a short unique identify that can be added to the start of the read name.

I think a simple solution is adding the record number to the start of each read. For example 100 reads would have SAMPLE_1 SAMPLE_2SAMPLE_100 added to the start of the read name. The most optimal solution would be a new subcommand that renames the headers to the sample record and then creates a text file with new and old names for renaming back or referencing.

On a slightly related note, it would be fantastic if add subcommand could add to the start of the read name.

Thanks!

Comments (2)

  1. Log in to comment