Changing variantIDs

Issue #52 resolved
Sander W. van der Laan
created an issue

Hi,

I'm trying to change my variantIDs. Currently they are of the following format:

SNPID RSID Chr BP A_allele B_allele
--- 1:10177:A:AC 01 10177 A AC
--- 1:10235:T:TA 01 10235 T TA
--- rs145072688:10352:T:TA 01 10352 T TA
--- 1:10505:A:T 01 10505 A T
--- 1:10506:C:G 01 10506 C G
--- 1:10511:G:A 01 10511 G A
--- rs62636508 01 10519 G C
--- chr1:10539 01 10539 C A
--- 1:10542:C:T 01 10542 C T

I want that differently. Either I want only the rs-number, if known, and alternatively chr:bp. So no attached alleles or leading chr. Therefore I made a file to change that as per instructions on the website:

SNPID RSID CHR BP AlleleA AlleleB SNPID_new RSID_new CHR_new BP_new AlleleA_new AlleleB_new
--- 1:10177:A:AC 1 10177 A AC --- 1:10177 1 10177 A AC
--- 1:10235:T:TA 1 10235 T TA --- 1:10235 1 10235 T TA
--- rs145072688:10352:T:TA 1 10352 T TA --- rs145072688 1 10352 T TA
--- 1:10505:A:T 1 10505 A T --- 1:10505 1 10505 A T
--- 1:10506:C:G 1 10506 C G --- 1:10506 1 10506 C G
--- 1:10511:G:A 1 10511 G A --- 1:10511 1 10511 G A
--- rs62636508 1 10519 G C --- rs62636508 1 10519 G C
--- chr1:10539 1 10539 C A --- 1:10539 1 10539 C A
--- 1:10542:C:T 1 10542 C T --- 1:10542 1 10542 C T

So I used QCTOOL v2 using this command:

qctool_v2.0 -g olddata/data_chr1.bgen -og data_chr1.bgen -map-id-data data_chr1.newIDs.txt

Then I ran -snp-stats using the command:

qctool_v2.0 -g data_chr1.bgen -snp-stats -osnp data_chr1.variantstats.txt

And got this result:

alternate_ids rsid chromosome position alleleA alleleB comment HW_exact_p_value HW_lrt_p_value alleleA_count alleleB_count alleleA_frequency alleleB_frequency minor_allele_frequency minor_allele major_allele info impute_info missing_proportion A B AA AB BB NULL total
--- 1:10177:A:AC 01 10177 A AC NA 0.957417 0.94436 1840.48 1211.52 0.60304 0.39696 0.39696 AC A 0.350247 0.350247 -5.95999e-16 0 0 554.348 731.781 239.87 -9.09495e-13 1526
--- 1:10235:T:TA 01 10235 T TA NA 1 0.971111 3050.19 1.80555 0.999408 0.000591597 0.000591597 TA T 0.260777 0.260777 0 0 0 1524.19 1.80555 3.57353e-16 0 1526
--- rs145072688:10352:T:TA 01 10352 T TA NA 0.714858 0.686652 1737.62 1314.38 0.569339 0.430661 0.430661 TA T 0.344353 0.344353 -1.49e-15 0 0 490.889 755.846 279.265 -2.27374e-12 1526
--- 1:10505:A:T 01 10505 A T NA 1 1 3051.66 0.341985 0.999888 0.000112053 0.000112053 T A 0.253202 0.253202 0 0 0 1525.66 0.341985 1.73472e-17 0 1526
--- 1:10506:C:G 01 10506 C G NA 1 1 3051.66 0.341985 0.999888 0.000112053 0.000112053 G C 0.253202 0.253202 0 0 0 1525.66 0.341985 1.73472e-17 0 1526
--- 1:10511:G:A 01 10511 G A NA 1 0.971111 3049.54 2.45956 0.999194 0.000805883 0.000805883 A G 0.172929 0.172929 2.98e-16 0 0 1523.54 2.45956 6.00214e-16 4.54747e-13 1526
--- rs62636508 01 10519 G C NA 1 0.856125 3041.74 10.2621 0.996638 0.00336242 0.00336242 C G 0.260609 0.260609 -5.95999e-16 0 0 1515.78 10.1761 0.0429999 -9.09495e-13 1526
--- chr1:10539 01 10539 C A NA 1 0.927826 3047.16 4.83645 0.998415 0.00158468 0.00158468 A C 0.261495 0.261495 2.98e-16 0 0 1521.16 4.83645 6.52256e-16 4.54747e-13 1526
--- 1:10542:C:T 01 10542 C T NA 1 1 3051.9 0.0950942 0.999969 3.1158e-05 3.1158e-05 T C 0.0236177 0.0236177 0 0 0 1525.9 0.0950942 1.38778e-17 0 1526

Am I missing something here? How come the variantIDs are not changed?

Thanks,

Sander

Comments (8)

  1. Sander W. van der Laan reporter

    Ok. Almost fixed. Should it have a header the map-id-update-file? Or not? Now every variant is changed, accept the first...

    alternate_ids rsid chromosome position alleleA alleleB comment HW_exact_p_value HW_lrt_p_value alleleA_count alleleB_count alleleA_frequency alleleB_frequency minor_allele_frequency minor_allele major_allele info impute_info missing_proportion A B AA AB BB NULL total
    **--- 1:10177:A:AC** 01 10177 A AC NA 0.957417 0.94436 1840.48 1211.52 0.60304 0.39696 0.39696 AC A 0.350247 0.350247 -5.95999e-16 0 0 554.348 731.781 239.87 -9.09495e-13 1526
    --- 1:10235 01 10235 T TA NA 1 0.971111 3050.19 1.80555 0.999408 0.000591597 0.000591597 TA T 0.260777 0.260777 0 0 0 1524.19 1.80555 3.57353e-16 0 1526
    --- rs145072688 01 10352 T TA NA 0.714858 0.686652 1737.62 1314.38 0.569339 0.430661 0.430661 TA T 0.344353 0.344353 -1.49e-15 0 0 490.889 755.846 279.265 -2.27374e-12 1526
    --- 1:10505 01 10505 A T NA 1 1 3051.66 0.341985 0.999888 0.000112053 0.000112053 T A 0.253202 0.253202 0 0 0 1525.66 0.341985 1.73472e-17 0 1526
    --- 1:10506 01 10506 C G NA 1 1 3051.66 0.341985 0.999888 0.000112053 0.000112053 G C 0.253202 0.253202 0 0 0 1525.66 0.341985 1.73472e-17 0 1526
    --- 1:10511 01 10511 G A NA 1 0.971111 3049.54 2.45956 0.999194 0.000805883 0.000805883 A G 0.172929 0.172929 2.98e-16 0 0 1523.54 2.45956 6.00214e-16 4.54747e-13 1526
    --- rs62636508 01 10519 G C NA 1 0.856125 3041.74 10.2621 0.996638 0.00336242 0.00336242 C G 0.260609 0.260609 -5.95999e-16 0 0 1515.78 10.1761 0.0429999 -9.09495e-13 1526
    --- 1:10539 01 10539 C A NA 1 0.927826 3047.16 4.83645 0.998415 0.00158468 0.00158468 A C 0.261495 0.261495 2.98e-16 0 0 1521.16 4.83645 6.52256e-16 4.54747e-13 1526
    --- 1:10542 01 10542 C T NA 1 1 3051.9 0.0950942 0.999969 3.1158e-05 3.1158e-05 T C 0.0236177 0.0236177 0 0 0 1525.9 0.0950942 1.38778e-17 0 1526
    
  2. Sander W. van der Laan reporter

    Oh, and somewhat related, I am getting this error below which I don't understand. What am I not seeing here?

    Thanks!

    Sander

    qctool_v2.0 -g _data_old_variantids/aaags.1kgp3.chr14.gen -og aaags.1kgp3.chr14.bgen -map-id-data aaags.1kgp3.chr14.IDupdate.txt
    
    Welcome to qctool
    (version: 2.0, revision 370dd7c)
    
    (C) 2009-2017 University of Oxford
    
    Opening position translation dictionary                     :  (0/?,0.0s,0.0/s)
    
    Error (genfile::MalformedInputError): Source "aaags.1kgp3.chr14.IDupdate.txt" is malformed on line 2..
    
    Thank you for using qctool.
    [svanderlaan@hpcs03 AAAGS_EAGLE2_1000Gp3]$ cat aaags.1kgp3.chr14.IDupdate.txt | head
    SNPID RSID CHR POS REF ALT SNPID_new RSID_new CHR_new POS_new REF_new ALT_new
    14:20000898 14:20000898 14 20000898 G GT 14:20000898 rs532972399 14 20000898 G GT
    14:20001092 14:20001092 14 20001092 T C 14:20001092 rs201677133 14 20001092 T C
    14:20001297 14:20001297 14 20001297 C T 14:20001297 rs575496719 14 20001297 C T
    14:20001365 14:20001365 14 20001365 A T 14:20001365 rs542216792 14 20001365 A T
    14:20001709 14:20001709 14 20001709 G A 14:20001709 rs192659454 14 20001709 G A
    14:20001712 14:20001712 14 20001712 C T 14:20001712 rs572970234 14 20001712 C T
    14:20002012 14:20002012 14 20002012 T C 14:20002012 rs564991995 14 20002012 T C
    14:20002068 14:20002068 14 20002068 C A 14:20002068 rs533540953 14 20002068 C A
    14:20002082 14:20002082 14 20002082 A G 14:20002082 rs551955275 14 20002082 A G
    [svanderlaan@hpcs03 AAAGS_EAGLE2_1000Gp3]$ cat aaags.1kgp3.chr14.IDupdate.txt | awk '{ print NF}' | head
    12
    12
    12
    12
    12
    12
    12
    12
    12
    12
    [svanderlaan@hpcs03 AAAGS_EAGLE2_1000Gp3]$ cat aaags.1kgp3.chr14.IDupdate.txt | awk '{ print NF}' | sort -u
    12
    
  3. Log in to comment