Cleanup Receptor and format parsing code

Issue #109 resolved
Jason Vander Heiden created an issue

Lots of changes were made to the code in the Receptor and Parsers modules to accommodate the AIRR format. It needs a lot of cleanup after these changes.

Also, we should probably convert the Receptor class to use AIRR naming and indexing rules, which will require us to change and test every other tool that uses the Receptor class.

Comments (3)

  1. Jason Vander Heiden reporter

    Looks like we will stick with changeo field names (except lowercase) for the Receptor class attributes.

    Some decisions need to be decided about the following changeo fields:

    • REV_COMP - should probably be a core (required) field, which means adding extraction of this info to the IMGT and iHMMune-align parsers, if available.
    • CDR3_IGBLAST_* - We are now using the IgBLAST CDR3 fields directly to determine JUNCTION, so we can probably drop these entirely.
    • *_VDJ - might make sense to change these to *_ALIGN for clarity between the changeo and airr formats.
    • FUNCTIONAL - Is technically the wrong name for this field. It should be PRODUCTIVE, but it's one of those fields that's regularly used and breaking backwards compatibility might be unwise.

    We should probably move the extra alignment field sets (regions, scores, junction, etc) out of ChangeoSchema and AIRRSchema and into the parser classes IMGTReader, IgBLASTReader, iHMMuneReader and make them Receptor attributes instead of output fields. Should be easier to understand that way, as the fields method of the schema are only relevant to the aligner parsing task.

    Should probably still switch index fields in Receptor to 0-based, because python.

  2. Jason Vander Heiden reporter
    1. REV_COMP is now in all parsers.
    2. Keeping CDR3_IGBLAST_* for compatibility, except renaming CDR3_IGBLAST_NT to CDR3_IGBLAST for consistency.
    3. Starting moving parser specific field list into parser classes.
  3. Jason Vander Heiden reporter
    • Leaving *_VDJ fields as is - _ALIGN was used by AlignRecords already.
    • Leaving FUNCTIONAL as is for now as well.
    • Leaving indexing as is for the moment. Will revisit later.

    Lots of cleanup on parsing, schemas and Receptor in a31a721. It's as clean as it's going to get for now.

  4. Log in to comment