basic, acidic and aromatic output fields from aminoAcidProperties() are always zero

Issue #91 resolved
Scott Christley created an issue

Using the same input file attached to #90 and this code:

db <- airr::read_rearrangement('vdjserver1.airr.tsv')
aa_db <- aminoAcidProperties(db)
airr::write_rearrangement(aa_db, 'aa_properties.airr.tsv')

The junction_aa_basic, junction_aa_acidic, and junction_aa_aromatic output fields are zero for all rearrangements.

Comments (14)

  1. Scott Christley reporter

    No rush. I ran across these issues during our hackathon, so it’s good to be tested! 😀

  2. Edel Aron

    Ok @Scott Christley , I did some investigating and I actually don’t think that this is a bug. In the code below, I read in your example file like you did and found the zeroes in the columns that you mentioned. However, aminoAcidProperties takes the “junction” column as default if you don’t specify it and your junction column has DNA in it, which is why you got zeroes in those columns (they search for [RHK], [DE] and [FWHY] respectively, none of which match any nucleotides) and not in the others. If you specify your “junction_aa” column instead or use nt=TRUE to translate your “junction” column, then those columns aren’t zeroes anymore. I tried to show a variety of ways to do this. You’ll also notice that the values in the other columns change. Hope this helps!

    db_airr <- airr::read_rearrangement(paste(path_data, "vdjserver1.airr.tsv", sep = "/"))
    db_airr_simple <- db_airr[1:10, c("sequence_id", "junction")] # fewer rows and columns for simplicity
    db_airr_simple$junction_aa <- translateDNA(db_airr_simple$junction) # checking the translation
    
    aminoAcidProperties(db_airr_simple) # zeroes for basic, acidic, aromatic columns
    aminoAcidProperties(db_airr_simple, seq="junction_aa", label="junction") # normal results
    aminoAcidProperties(db_airr_simple, seq="junction", nt=TRUE) # normal results
    aminoAcidProperties(head(db_airr, 10), nt=TRUE) # normal results
    

    P.S. I’m glad you mentioned this because some of your rows had NA’s in the junction so I was able to find a hidden bug which was throwing an error on those.

  3. Scott Christley reporter

    and your junction column has DNA in it

    Are you saying that the junction field should contain amino acids? The AIRR standard is for junction to contain nucleotides and junction_aa to contain amino acids, so I’m unclear why the semantics are different. It seems aminoAcidProperties should just look at junction_aa as default for AIRR.

  4. Scott Christley reporter

    a simple change might work, the defaults for aminoAcidPropertiesare changed from this:

    seq = "junction",
    nt = FALSE,
    

    to this:

    seq = "junction_aa",
    nt = FALSE,
    

    or this if you prefer nucleotides:

    seq = "junction",
    nt = TRUE,
    

  5. Jason Vander Heiden

    Yeah, I think nt=TRUE should be the default. Just because JUNCTION_AA isn’t required for the legacy changeo format, so we’ll retain better backwards compatibility.

    It looks like it has been “bugged” with incoherent default arguments for a long time.

  6. Edel Aron

    Agreed on changing the default values; I’ll update it to use nt=true by default instead. Thanks Scott (and Jason)!

  7. Log in to comment