Clone wiki

ky_wbprojects / First_pass

Issue tracker item #7

Caltech documentation
First-pass to Curation
First-pass_flagging_pipelines | New Alleles

First-Pass Rotation

First-Pass CuratorRound 1Round 2
Karen2/16-3/17/6-7/19
Xiaodong3/2-3/157/20-8/1
Raymond3/16-3/298/3-8/16
Jolene3/30-4/12: New forms active8/17-8/30
Gary4/13-4/26: Author forms sent out8/31-9-13
Ranjana4/27-5/10: Author feedback coming in9/14-9/27
Erich5/11-5/249/28-10/11
Wen5/25- 6/710/12-10/35
Kimberly6/8-6/2110/26-11/8
WormBase re-evalutation6/22-7/5: IWM(?)

Automation

Textpresso's Automation Explanation and Schedule http://goldturtle.caltech.edu/wcat/


Data types

Data typeDescriptionCurrent statusProjected
C. elegans (default checked)Data is present for C. elegans N2 BristolWith the ability to scan specific sections of papers, the species data types should be prioritized for automation. We will need to develop training sets for SVM, however we can try Histogram and rule-based methods first.
C. elegans other than BristolData is present for C. elegans isolates other than Bristol, e.g., Hawaiian, CB4855, etc.see above
Nematodes other than C. elegansData is presented for Caenorhabditis sister species e.g., briggsae, remanei, and/or related nematodes including parasitic nematodessee above
Non-nematode speciesData is presented for non nematode species, e.g., Human, Mouse, Yeast, Dog, Plant, etc.see above
Genes studied in this paperGene(s) studied in the paperIdentified and extracted by TextpressoGenes missed by textpresso are currently caught by users and first pass curators.
Genes cloned in the paperGenes newly identified, named, cloned, reassigned etc. that do not exist in WormBase alreadyIt is hoped that authors will report these objects to us directly. We can use rule-based methods to identify some of these objects.
New allelesAlleles were reported that don't exist in WormBase alreadyIdentified and extracted by Textpresso by pattern matching using lab strain designations as part of the patternFlagging of this data type is fully automated. Missed alleles will be picked up during manual curation by data curators and be reported by users.
Genetic mapping dataThe location of the gene was determined using genetic tools, e.g., 2-factor recombination, 3-factor interval linkage, Df breakpoints, etc.
Phenotype analysisThe paper reports phenotypes of mutants or phenotypic analysis of "wild-type" nematode strainsPhenotype analysis flagging has been automated by SVM methodsFlagging of this data type is fully automated.
Overexpression phenotypeThe paper reports phenotypes caused by the overexpression of transgenesCheck with Jolene to see where this stands in relation to the general phenotype data type flag.
ChemicalsChemicals or drug were used to analyze strain behavior, physiology, gene function, etc. of mutant or 'normal' wormsNot actively curated. Rule-based Textpresso identification of this data type is underway, as is the accumulation of papers for a training setOnce a training set is established, this data type can undergo SVM testing.
Small-scale and large-scale RNAi experimentsGene function was assayed by RNA interferenceThese data types have been identified using both rule-based Textpresso methods and SVMFlagging of this data type is fully automated.
Mosaic analysisGene function was assayed in specific cells using lineage analysis
Tissue or cell site of actionGene function was assayed in specific cells or tissues, such as in the case where gene function was rescued by cell/tissue-specific expression of the gene
Time of actionThe timing of a gene's function was assayed, for example with temperature-shift experimentsNew data type, not actively curated.
Molecular function of a gene productA new/novel molecular function or aspect of molecular function for a gene was identified
Homolog of a human disease-associated geneA gene studied in the paper is a homolog of a human gene, which is directly associated with a diseaseRule-based automation currently under development
Genetic interactionsGenes were assayed for having an effect on the function of another gene. Often this is made apparent by the analysis of double, triple, etc. mutants, or with the use of experiments where RNAi was used concurrent with other RNAi-treatment or mutationsSome category based textpresso automated flagging has been done with this data type. No one is currently in charge of this data type.
Functional complementationThe paper reports functional redundancy between separate genes, e.g., the rescue of gen-A by overexpression of gen-B, or any other extragenic sequence, or by the rescue of gene function by a gene from another speciesNot actively curated
Gene product interactionsThe paper reports data on protein-protein, RNA-protein, DNA-protein, or Y2H interactions, etc.
New expression pattern for a geneThe paper reports new temporal or spatial (e.g., tissue, subcellular, etc.) data on the pattern of expression of any gene in a wild-type background, which includes reporter gene analysis, antibody staining, In situ hybridization, RT-PCR, Western or Northern blot dataSome SVM results have been returned for this data type.
Alterations in gene expression by genetic or other treatmentThe paper reports changes or lack of changes in gene expression levels or patterns in response to genetic, chemical, temperature, or any other experimental treatmentSome SVM results have been returned for this data type.
Regulatory sequence featuresThe paper reports any gene expression regulatory elements, e.g., DNA/RNA elements required for gene expression, promoters, introns, UTR's, DNA binding sites, etc.
Position frequency matrix (PFM) or Position weight matrix (PWM)Indicates that the paper reports PFMs or PWMs, which are typically used to define regulatory sites in genomic DNA (e.g., bound by transcription factors) or mRNA (e.g., bound by translational factors or miRNA). PFMs define simple nucleotide frequencies, while PWMs are scaled logarithmically against a background frequency
MicroarrayThe paper reports microarray data
Protein analysis in vitroThe paper reports any in vitro protein analysis such as kinase assays, agonist pharmacological studies, reconstitution studies, etc.Not actively curated
Domain analysisParticular domains within a protein were targeted for genetic or molecular analysisNot actively curated
Covalent modificationPost-translational modifications of a gene product were assayed by mutagenesis or in vitro analysisNot actively curatedThis can be picked up by me for curating in WormCyc
Structural informationProtein structure was assayed through NMR, X-Ray crystallography, etc.Not actively curated
Mass spectrometryThe paper reports mass spectrometry analysis (MS/MS, LCMS, HRMS) using analysis programs such as MASCOT, SEQUEST, X!Tandem, OMSSA, MassMatrixOnly specific mass spec data is curated.
C. elegans antibodiesAntibodies used in experiments that were generated in a noncommercial laboratory and against a C. elegans gene productFully automated, Textpresso PKC (patterns, key words, categories)
Integrated transgenesIntegrated transgenes used to assay gene function and that don't exist in WormBase alreadyFlagging of this data type is fully automated.
Transgenes used as tissue markersReporter constructs (integrated transgenes) used to mark certain tissues, subcellular structures, or life stages, etc., as a reference to assay site of action of gene function or locationIdentification of integrated transgenes has been fully automated, this data type is a subsection of that data type.
Gene structure correctionGene structures that differ from the ones in WormBase, e.g., different splice-sites, 3'UTR, etc.
Sequence of mutant allelesSequence data for any mutation.
New SNPsSNPs that don't exist in WormBase already
Ablation dataCells or anatomical units were ablated by laser or other means (e.g., by expressing a cell-toxic protein) to determine gene function.
Cell functionThe paper reports a function for any anatomical part (e.g., cell, tissue, etc.), which has not been indicated elsewhere on this form
Phylogenetic dataThe paper discusses evolutionary relationships between or among genes or gene productsNot actively curated
Other bioinformatics analysisThe paper reports bioinformatic data not indicated anywhere else on this form.Not actively curated
Supplemental materialsSupplementary Materials are attached to the paper
NONE of the aforementioned data types are in this research articleThis is used as a default category for any paper where the author checked "here" for review or non primary research paper. In addition, key words are entered here for data types that are not currently flagged but may be flagged later on
FeedbackThis is used to record thoughts, notes, comments about the form etc.

First-pass forms


Instructions for curators

Overview of the new form http://www.wormbase.org/wiki/index.php/Instructions_for_curators


Postgres data for individual data types


Updated