readChangeoDb convert the sequence_id column into double, probably when ID start by a digit

Issue #104 resolved
ssnn created an issue

Copied from https://github.com/cran/alakazam/issues/2

Hi.
Using the attached file and the following command, the sequence_id is converted into a dbl and information is lost.
The dowser::formatClones(text_fields = "sequence_id") cannot be used thenafter.
Thanks !
alakazam::readChangeoDb("57_renamed_seq2.txt") # to compare with str(read.table("57_renamed_seq2.txt", header = TRUE))
57_renamed_seq2.txt

Comments (2)

  1. ssnn reporter

    readChangeoDb expects the input data in the Change-O format. The user data is using the AIRR-C format.readChangeoDb assigns the data types here https://bitbucket.org/kleinstein/alakazam/src/21d62172734c5f1ab2040d6379f5615ce59e7236/R/Core.R#lines-65 following the Change-O speficication (see the variable alakazam::CHANGEO, defined here https://bitbucket.org/kleinstein/alakazam/src/21d62172734c5f1ab2040d6379f5615ce59e7236/data-raw/GenerateSysData.R#lines-52:130)..) Because sequence_id is not a Change-O valid field (should be SEQUENCE_ID), it is not typecasted, and readr::read_tsv loads sequence_id as double. To read AIRR-C Standard formatted files, use the function read_rearrangement from the airr package (CRAN), which is maintained by the AIRR Community, and updated whenever the standard is updated. We have added a warning to readChangeoDb to suggest to use airr::read_rearrangement when data is clearly not in the Change-O format, and seems to be using the AIRR-C Standard.

  2. Log in to comment