DefineClones will group all sequences together if the junction column is missing

Issue #139 resolved
Jason Vander Heiden created an issue

len(Receptor.junction) is 0 if the JUNCTION column is missing from the input, forcing all records into the same junction length grouping during DefineClones.groupByGene.

We should add a call to changeo.IO.checkFields() in the main function of DefineClones (see CreateGermlines for an example) to catch malformed input.

We may also want to consider changing the behavior of DefineClones.groupByGene to pre-clone based on the length of the --sf specified sequence field. Though this is an algorithm change.

Comments (5)

  1. Jason Vander Heiden reporter

    Or make sure that a junction of None doesn't resolve to length 0. Probably in DefineClones.filterMissing. Not sure if that's the best place - just a hunch. Check that this still exists, because I may have already fixed it...

    There is a checkFields example in CreateGermlines.createGermlines (line 97).

  2. Kenneth Hoehn

    Error already handled, but added this to defineClones line 532 for good measure:

    # Check for required columns
        try:
            required = ['junction']
            checkFields(required, out_fields, schema=schema)
        except LookupError as e:
            printError(e)
    
  3. Log in to comment