basic, acidic and aromatic output fields from aminoAcidProperties() are always zero
Using the same input file attached to #90 and this code:
db <- airr::read_rearrangement('vdjserver1.airr.tsv')
aa_db <- aminoAcidProperties(db)
airr::write_rearrangement(aa_db, 'aa_properties.airr.tsv')
The junction_aa_basic, junction_aa_acidic, and junction_aa_aromatic output fields are zero for all rearrangements.
Comments (14)
-
-
-
assigned issue to
-
assigned issue to
-
reporter No rush. I ran across these issues during our hackathon, so it’s good to be tested!
-
-
assigned issue to
-
assigned issue to
-
- changed status to open
-
- changed milestone to next release
-
Ok @Scott Christley , I did some investigating and I actually don’t think that this is a bug. In the code below, I read in your example file like you did and found the zeroes in the columns that you mentioned. However,
aminoAcidProperties
takes the “junction” column as default if you don’t specify it and your junction column has DNA in it, which is why you got zeroes in those columns (they search for [RHK], [DE] and [FWHY] respectively, none of which match any nucleotides) and not in the others. If you specify your “junction_aa” column instead or usent=TRUE
to translate your “junction” column, then those columns aren’t zeroes anymore. I tried to show a variety of ways to do this. You’ll also notice that the values in the other columns change. Hope this helps!
db_airr <- airr::read_rearrangement(paste(path_data, "vdjserver1.airr.tsv", sep = "/")) db_airr_simple <- db_airr[1:10, c("sequence_id", "junction")] # fewer rows and columns for simplicity db_airr_simple$junction_aa <- translateDNA(db_airr_simple$junction) # checking the translation aminoAcidProperties(db_airr_simple) # zeroes for basic, acidic, aromatic columns aminoAcidProperties(db_airr_simple, seq="junction_aa", label="junction") # normal results aminoAcidProperties(db_airr_simple, seq="junction", nt=TRUE) # normal results aminoAcidProperties(head(db_airr, 10), nt=TRUE) # normal results
P.S. I’m glad you mentioned this because some of your rows had NA’s in the junction so I was able to find a hidden bug which was throwing an error on those.
-
- changed status to resolved
-
reporter and your junction column has DNA in it
Are you saying that the
junction
field should contain amino acids? The AIRR standard is forjunction
to contain nucleotides andjunction_aa
to contain amino acids, so I’m unclear why the semantics are different. It seemsaminoAcidProperties
should just look atjunction_aa
as default for AIRR. -
reporter a simple change might work, the defaults for
aminoAcidProperties
are changed from this:seq = "junction", nt = FALSE,
to this:
seq = "junction_aa", nt = FALSE,
or this if you prefer nucleotides:
seq = "junction", nt = TRUE,
-
reporter - changed status to open
I suggest a change to the default values for aminoAcidProperties
-
Yeah, I think
nt=TRUE
should be the default. Just becauseJUNCTION_AA
isn’t required for the legacy changeo format, so we’ll retain better backwards compatibility.It looks like it has been “bugged” with incoherent default arguments for a long time.
-
Agreed on changing the default values; I’ll update it to use
nt=true
by default instead. Thanks Scott (and Jason)! -
- changed status to resolved
- Log in to comment
Sorry for the slow. I’ll try to look at it this week.