Clone wiki

ky_wbprojects / NBP_data_parsing

Parsing new NBP data into Postgres

.ace file is received by email, usually from Mary Ann:

1. Copy NBP phenotype .ace data into a folder on your desktop

  • Can break up file into managable sections, new alleles/homozygous viable, new alleles/lethal_sterile/ updated records (these take awhile too so, break these into pieces as well).
  • Or, since retrieving information through the OA can now be limited to a small number, it is possible to add in the all the data and retrieve the data by batches.

2. Move each .ace file onto tazendra:
/home/acedb/karen/phenote/nbp_maryann renaming file to 'NBP_latest'
>scp [filename] acedb@tazendra:/home/acedb/karen/phenote/nbp_maryann/NBP_latest

3. Log on to tazendra and go to parsing folder
> ssh acedb@tazendra
> psswd
> cd /home/acedb/karen/phenote/nbp_maryann

4. Run parsing script and create an output file
> ./parse_maryann.pl > <data>NBPout

The script is below.

Annotating NBP data in Phenote

Retrieve new/changed NBP characters

The easiest way to retrieve all the newly uploaded NBP data is to initially search for one of the alleles you know to be part of the new NBP phenotype batch. Once that record is retrieved in the table, you can retrieve the remaining records by NBP date.

NBP Comment of 'lethal or sterile'

Allele gets annotated for sterile and separately as lethal. In both phenotype remarks, put: -->" Classified as lethal OR sterile by the National Bioresource Project of Japan. "

NBP Comment of 'homozygous viable'

Allele gets annotated with NOT lethal -->" Classified as homozygous viable by the National Bioresource Project of Japan. "

NBP Comment changes to 'homozygous viable'

If an allele was originally classified as lethal or sterile and is now changed to homozygous viable, the new comment is only appended to one of the remarks.

You will need to retrieve the 'sterile' annotation line for that allele and check NOT for the sterile phenotype.

The phenotype remark should also be changed to "Homozygous viable; originally classified as lethal or sterile by the National Bioresource Project of Japan."

Source info

Use the NBP remark as the basis for phenotype annotations. If the comment is only from the NBP, annotate the Laboratory field with FX. If the comment is from a known lab, use the laboratory field from that lab. If there is no laboratory associated with the comment, use nothing, we do not use the Person as a reference, currently.

-cut and paste comment templates-

Comments to the National Bioresource Project of Japan:

Comment from Dr. R. Waterston to the National Bioresource Project of Japan: Dead eggs =

Originally Classified as 'homozygous viable' by the National Bioresource Project of Japan.

Classified as lethal OR sterile by the National Bioresource Project of Japan.

Classified as homozygous viable by the National Bioresource Project of Japan.

parse_maryann.pl script

#!/usr/bin/perl -w

# update NBP data if the allele already exists, otherwise create a new
# character.  2008 05 01
#
# update NBP timestamp when doing udpates.  grant access to acedb account on
# these tables.  2008 05 02
#
# add filereaddate for Jolene.
# add filereaddate updates if $found, for Jolene.  2010 04 07
#
# changed for DBI.pm  2010 08 11
#
# add curator as Mary Ann  2010 08 27


use strict;
use diagnostics;
use Jex;
use Encode qw( from_to is_utf8 );

use DBI;
use Encode qw( from_to is_utf8 );

my $dbh = DBI->connect ( "dbi:Pg:dbname=testdb", "", "") or die "Cannot connect to database!\n"; 


my $date = &getPgDate();

# my $infile = 'NBP_latest';
my $infile = 'NBP_latest';
open (IN, "<$infile") or die "Cannot open $infile : $!";
while (my $line = <IN>) {
  if ($line =~ m/Variation : (\w+) \"(.*?)\"/) { 
    my $allele = $1; my $remark = $2;
    &check($allele, $remark);
  }
} # while (my $line = <IN>)
close (IN) or die "Cannot close $infile : $!";

sub check {
  my ($allele, $remark) = @_;  my %joinkeys;  my $found = 0;
  unless (is_utf8($remark)) { from_to($remark, "iso-8859-1", "utf8"); }
  if ($remark =~ m/\'/) { $remark =~ s/\'/''/g; }
  my $result = $dbh->prepare( "SELECT * FROM app_tempname WHERE app_tempname = '$allele';" );
  $result->execute() or die "Cannot prepare statement: $DBI::errstr\n"; 
  while (my @row = $result->fetchrow) { if ($row[0]) { $joinkeys{$row[0]}++; $found++; } }
  my @commands;
  if ($found) { 
      my $command = "UPDATE app_nbp SET app_timestamp = CURRENT_TIMESTAMP WHERE joinkey IN (SELECT joinkey FROM app_tempname WHERE app_tempname
 = '$allele');";
      push @commands, $command;
      $command = "UPDATE app_nbp SET app_nbp = '$remark' WHERE joinkey IN (SELECT joinkey FROM app_tempname WHERE app_tempname = '$allele');";
      push @commands, $command;
      foreach my $joinkey (sort keys %joinkeys) {
        $command = "UPDATE app_filereaddate SET app_timestamp = CURRENT_TIMESTAMP WHERE joinkey = '$joinkey';";
        push @commands, $command;
        $command = "UPDATE app_filereaddate SET app_filereaddate = '$date' WHERE joinkey = '$joinkey';";
        push @commands, $command;
        $command = "INSERT INTO app_filereaddate_hst VALUES ('$joinkey', '$date');";
        push @commands, $command;
        $command = "INSERT INTO app_nbp_hst VALUES ('$joinkey', '$remark');";
        push @commands, $command; } }
    else {
      my $latest = 0;
      $result = $dbh->prepare( "SELECT joinkey FROM app_tempname;" );
      $result->execute() or die "Cannot prepare statement: $DBI::errstr\n"; 
      while (my @row = $result->fetchrow) { if ($row[0] > $latest) { $latest = $row[0]; } }
      $result = $dbh->prepare( "SELECT joinkey FROM app_type;" );
      $result->execute() or die "Cannot prepare statement: $DBI::errstr\n"; 
      while (my @row = $result->fetchrow) { if ($row[0] > $latest) { $latest = $row[0]; } }
      $latest++; 
      my $command = "INSERT INTO app_type_hst VALUES ('$latest', 'Allele');";
      push @commands, $command;
      $command = "INSERT INTO app_tempname_hst VALUES ('$latest', '$allele');";
      push @commands, $command;
      $command = "INSERT INTO app_nbp_hst VALUES ('$latest', '$remark');";
      push @commands, $command;
      $command = "INSERT INTO app_filereaddate_hst VALUES ('$latest', '$date');";
      push @commands, $command;
      $command = "INSERT INTO app_curator_hst VALUES ('$latest', 'WBPerson2970');";
      push @commands, $command;
      $command = "INSERT INTO app_type VALUES ('$latest', 'Allele');";
      push @commands, $command;
      $command = "INSERT INTO app_tempname VALUES ('$latest', '$allele');";
      push @commands, $command;
      $command = "INSERT INTO app_nbp VALUES ('$latest', '$remark');";
      push @commands, $command;
      $command = "INSERT INTO app_filereaddate VALUES ('$latest', '$date');";
      push @commands, $command;
      $command = "INSERT INTO app_curator VALUES ('$latest', 'WBPerson2970');";
      push @commands, $command; }
  foreach my $command (@commands) {
    print "$command\n";
    my $result2 = $dbh->prepare( $command );
    $result2->execute() or die "Cannot prepare statement: $DBI::errstr\n"; 
  } # foreach my $command (@commands)
} # sub check

__END__

}}}

Updated