Casey Dunn avatar Casey Dunn committed d177bad

added md

Comments (0)

Files changed (1)

+# Metazoan SRA summary
+
+## Getting the SRA data
+
+Go to http://www.ncbi.nlm.nih.gov/sra , and make the following query:
+
+``` ("strategy rna seq"[Properties] OR transcriptomic[Text Word]  OR cDNA[Text Word] ) AND "platform illumina"[Properties] AND metazoa[Organism] NOT vertebrata[Organism] NOT insects[Organism]) AND ("2000/01/01"[Modification Date] : "3000"[Modification Date]
+```
+
+Then click "Send To" and select "File" and "Summary". This will download the file `sra_result.csv`.
+
+Then load the data:
+
+```r
+D <- read.csv("sra_result.csv")
+```
+
+
+This gives a total of 2296 samples.
+
+
+## Filtering the data
+
+There isn't a way to tell from these data if the reads are single ended (SE) or paired end (PE). We can calcualte the the number of bases per spot, which is the read length for SE data and the sum of the forward and reverse read lengths for PE data.
+
+We will filter according to a minimum length that is reasonable for both:
+
+
+```r
+
+D <- cbind(D, (D$Total.Bases/D$Total.Spots))
+names(D)[names(D) == "(D$Total.Bases/D$Total.Spots)"] <- "Spot.Length"
+
+min_length = 70
+
+D <- D[(D$Spot.Length) >= min_length, ]
+```
+
+
+1184 samples pass the filtering criteria.
+
+## Sumarizing the available data
+
+
+```r
+
+Ds <- rowsum(D$Total.Spots, group = D$Organism.Name)
+Ds
+```
+
+```
+##                                      [,1]
+## Abacion magnum                   31791284
+## Abylopsis tetragona              21575176
+## Acropora hyacinthus             220937001
+## Adineta ricciae                  11685405
+## Adineta vaga                     17607592
+## Agalma elegans                   53998182
+## Aiptasia                        633153354
+## Alipes grandidieri               16166017
+## Amblyomma maculatum              25934570
+## Anguillicola crassus            213500758
+## Aphelenchoides fragariae         43364338
+## Aplysia californica             556263314
+## Apostichopus japonicus          149167965
+## Argulus siamensis                77759443
+## Ascaris suum                   1317718122
+## Biomphalaria glabrata           172317158
+## Botryllus schlosseri             35392793
+## Brachionus calyciflorus         226935793
+## Brachycybe lecontii              27445580
+## Brugia malayi                   136729359
+## Brugia pahangi                  179886381
+## Bursaphelenchus mucronatus       16351464
+## Bursaphelenchus xylophilus       17267877
+## Caenorhabditis angaria           53171790
+## Caenorhabditis brenneri         218405837
+## Caenorhabditis briggsae         213907295
+## Caenorhabditis briggsae AF16     24084934
+## Caenorhabditis elegans                 NA
+## Caenorhabditis japonica          99876693
+## Caenorhabditis remanei          292952001
+## Caenorhabditis sp. 11 MAF-2010   31091076
+## Caenorhabditis sp. 7 MAF-2007    23430085
+## Caenorhabditis sp. 9 MAF-2010    30802534
+## Cambala annulata                 29499905
+## Cephalothrix hongkongiensis      26112259
+## Cerebratulus marginatus          26688280
+## Cleidogona sp. MB-2013           23567005
+## Clonorchis sinensis              31965154
+## Coenobita                         6484645
+## Corbicula fluminea               33543565
+## Corticium candelabrum            76942600
+## Craseoa lathetica                38233199
+## Crassostrea gigas               379297659
+## Crassostrea virginica            26428921
+## Crella elegans                   62931417
+## Dendrocoelum lacteum            878220598
+## Dermacentor reticulatus             26563
+## Dictyocaulus filaria             14702570
+## Dictyocaulus viviparus          468332730
+## Dirofilaria immitis              46392164
+## Dugesia japonica               1095586399
+## Echinococcus                     18566482
+## Echinococcus granulosus         109092562
+## Echinococcus multilocularis     867335477
+## Elysia chlorotica                      56
+## Ennucula tenuis                  38724175
+## Ephydatia muelleri               78257690
+## Eriocheir sinensis              205743183
+## Eupolybothrus cavernicolus       10794305
+## Evechinus chloroticus           123814245
+## Fasciola gigantica               21874729
+## Fascioloides magna               27002004
+## Gadila tolmiei                   37971066
+## Gasteracantha hasselti           12564452
+## Globodera pallida               898906868
+## Globodera rostochiensis          60699691
+## Glomeridesmus sp. MB-2013        24546123
+## Gorgonia ventalina               71285339
+## Haemonchus contortus            402954318
+## Haliotis midae                    5399167
+## Haliotis rufescens               67755310
+## Hermodice carunculata           213277962
+## Holothuria glaberrima           331931211
+## Hormogaster elisae              104195139
+## Hormogaster samnitica            26978390
+## Hydra vulgaris                   55350004
+## Hymenolepis microstoma          183972963
+## Ixodes ricinus                  314283869
+## Latrodectus tredecimguttatus     27605467
+## Lernaea cyprinacea               32557314
+## Lithobius sp. MB-2013            48208463
+## Litopenaeus vannamei            248243153
+## Loa loa                           5738066
+## Lymnaea stagnalis                81851004
+## Macracantha arcuata              17523883
+## Macrobrachium rosenbergii        50798546
+## Metasiro americanus              12488783
+## Mizuhopecten yessoensis          56132648
+## Mytilus galloprovincialis        39878184
+## Nanomia bijuga                  118955454
+## Nasoonaria sinensis              19990955
+## Necator americanus               98645947
+## Nematostella vectensis          473946460
+## Neomenia megatrapezata           29291588
+## Neomeniomorpha sp. 1 SS-2011     35228112
+## Octopus vulgaris                  8250668
+## Oesophagostomum dentatum        452305211
+## Onchocerca ochengi              180003618
+## Onchocerca volvulus             205446807
+## Opisthorchis viverrini           39142047
+## Oscarella carmela                84543565
+## Oscarella sp. SN-2011            15811352
+## Panagrellus redivivus            45086997
+## Pandalus latirostris            110149263
+## Panonychus citri                103991798
+## Parastichopus parvimensis        87108446
+## Patiria miniata                 692739125
+## Penaeus monodon                  98174733
+## Perna viridis                    20794585
+## Petaserpes sp. MB-2013           34470090
+## Petrosia ficiformis              77406715
+## Physalia physalis                36481773
+## Platygyra carnosus               52308801
+## Pocillopora damicornis          158069623
+## Poecilosclerida                  13256767
+## Pomacea canaliculata             12861761
+## Pomatoceros lamarckii            36458077
+## Pontastacus leptodactylus       229819635
+## Portunus trituberculatus         92525401
+## Pristionchus pacificus           60615793
+## Procotyla fluviatilis           394780007
+## Prostemmiulus sp. MB-2013        31622533
+## Pseudopolydesmus sp. MB-2013     32233544
+## Pugilina cochlidium              33772545
+## Radix balthica                    8461925
+## Rhipicephalus pulchellus        120614564
+## Rhizoglyphus robini             154564765
+## Rhyssoplax olivacea              23189291
+## Rotylenchulus reniformis         51495594
+## Ruditapes philippinarum          45094211
+## Schistosoma mansoni             916955137
+## Schmidtea mediterranea         1502321629
+## Solemya velum                    33298527
+## Strongylocentrotus purpuratus  1157061246
+## Strongyloides ratti             283868744
+## Strongyloides stercoralis      1183320637
+## Sycon coactum                    78179795
+## Taenia multiceps                 13723885
+## Taenia pisiformis                13333334
+## Testacella                             25
+## Tetraclita japonica              26256871
+## Tetranychus cinnabarinus         26040173
+## Tetranychus urticae             145172414
+## Theridion californicum          166104219
+## Theridion grallator             254695482
+## Tigriopus californicus           20735302
+## Trichinella spiralis             41876935
+## Trichuris muris                 356078709
+## Varroa destructor                22920031
+## Villosa lienosa                  81583676
+```
+
+150 species have relevant data.
+
+
+## Selecting focal taxa
+
+The following is a vector of the taxa we wish to consider further:
+
+
+```r
+foci <- c("Abylopsis tetragona", "Agalma elegans")
+```
+
+
+Here are the relevant samples:
+
+
+```r
+D[(D$Organism.Name %in% foci), c(3, 6, 12, 19)]
+```
+
+```
+##            Organism.Name Study.Accession Total.Spots Spot.Length
+## 1325      Agalma elegans       SRP023468    53998182         200
+## 1326 Abylopsis tetragona       SRP023468    21575176         200
+```
+
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.