readChangeoDb convert the sequence_id column into double, probably when ID start by a digit
Issue #104
resolved
Copied from https://github.com/cran/alakazam/issues/2
Hi.
Using the attached file and the following command, the sequence_id is converted into a dbl and information is lost.
The dowser::formatClones(text_fields = "sequence_id") cannot be used thenafter.
Thanks !
alakazam::readChangeoDb("57_renamed_seq2.txt") # to compare with str(read.table("57_renamed_seq2.txt", header = TRUE))
57_renamed_seq2.txt
Comments (2)
-
reporter -
reporter - changed status to resolved
Commit #25e872d
- Log in to comment
readChangeoDb
expects the input data in the Change-O format. The user data is using the AIRR-C format.readChangeoDb
assigns the data types here https://bitbucket.org/kleinstein/alakazam/src/21d62172734c5f1ab2040d6379f5615ce59e7236/R/Core.R#lines-65 following the Change-O speficication (see the variable alakazam::CHANGEO, defined here https://bitbucket.org/kleinstein/alakazam/src/21d62172734c5f1ab2040d6379f5615ce59e7236/data-raw/GenerateSysData.R#lines-52:130)..) Becausesequence_id
is not a Change-O valid field (should beSEQUENCE_ID
), it is not typecasted, andreadr::read_tsv
loadssequence_id
as double. To read AIRR-C Standard formatted files, use the functionread_rearrangement
from theairr
package (CRAN), which is maintained by the AIRR Community, and updated whenever the standard is updated. We have added a warning toreadChangeoDb
to suggest to useairr::read_rearrangement
when data is clearly not in the Change-O format, and seems to be using the AIRR-C Standard.