ExprTableReport error reading GTF

Issue #35 resolved
Alejandra Cervera created an issue

The error was triggered using ensembl gtf genome 38 version 80 (you can find it in /opt/share/annotation/human-ensembl38/Homo_sapiens.GRCh38.80.gtf)

I don't know if the error is related to my expression file having geneName and geneId as column names and not gene_name and gene_id, but it may be something else

Log of the error is included below

[LOG exprStatsReport-myAnnotation0] Traceback (most recent call last): [LOG exprStatsReport-myAnnotation0] File "/mnt/csc-gc5/home/cerverat/cloud/anduril2/anduril/bundles/sequencing/components/GTFParser/gtfparser.py", line 20, in <module> [LOG exprStatsReport-myAnnotation0] out_row.append( in_row[field] ) [LOG exprStatsReport-myAnnotation0] File "/mnt/csc-gc5/home/cerverat/cloud/anduril2/anduril/bundles/sequencing/components/GTFParser/GTFTools.py", line 25, in getitem [LOG exprStatsReport-myAnnotation0] raise KeyError("Line %d does not have an element %s.\nline:%s" %(self.lineNumber, k, self.gtfFile.lines[self.lineNumber])) [LOG exprStatsReport-myAnnotation0] KeyError: 'Line 0 does not have an element geneId.\nline:1\thavana\tgene\t11869\t14409\t.\t+\t.\tgene_id "ENSG00000223972"; gene_version "5"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene";\n' [ERROR exprStatsReport-myAnnotation0] ExprTableReport.scala:97, Component exprStatsReport-myAnnotation0: Component returned error status: 1 (Generic error)

Comments (4)

  1. Julia Casado

    Update your clone of anduril2_case_study and check the code at stats.scala. It is already run and the results are at /mnt/storageBig7/OvCa_Anduril2_CaseStudy/Stats

    The function ExprTableReport required the same column names for both expression table and annotation file:

    val annotation = INPUT(path="/opt/share/annotation/human-ensembl38/Homo_sapiens.GRCh38.80.gtf")
    val exprTable0 = INPUT(path="/mnt/storageBig7/OvCa_Anduril2_CaseStudy/Expression/output/exprTableG-log2-table.csv")
    // Rename gene_id and gene_name columns to match
    // those from the annotation (GTF) file
    val exprTable  = CSVCleaner(in=exprTable0, rename="geneId=gene_id,geneName=gene_name")
    // Call to ExprTableReport ...
    
  2. Alejandra Cervera reporter

    ok, I changed some things: You already have hardcoded the biotype name so I don't see a reason to not hardcode the gene name and gene id things that come from the ensembl gtf which is supposedly static. Otherwise I would have to keep renaming my columns every time I have to run this function, since each quantifier has a different way of naming the columns. In the future we could consider using a parameter to define those instead of having them hardcoded.

  3. Alejandra Cervera reporter

    It works now with the ensembl col names hardcoded and providing custom keyCol and nameCol from the expression file.

  4. Julia Casado

    The hardcoded biotype was part of the ExpressionStats component, I didn't want to touch a working component at this stage not knowing which functions/pipelines used it. But it can be a new "enhancement" ticket if you want.

  5. Log in to comment