ExprTableReport error reading GTF
The error was triggered using ensembl gtf genome 38 version 80 (you can find it in /opt/share/annotation/human-ensembl38/Homo_sapiens.GRCh38.80.gtf)
I don't know if the error is related to my expression file having geneName and geneId as column names and not gene_name and gene_id, but it may be something else
Log of the error is included below
[LOG exprStatsReport-myAnnotation0] Traceback (most recent call last): [LOG exprStatsReport-myAnnotation0] File "/mnt/csc-gc5/home/cerverat/cloud/anduril2/anduril/bundles/sequencing/components/GTFParser/gtfparser.py", line 20, in <module> [LOG exprStatsReport-myAnnotation0] out_row.append( in_row[field] ) [LOG exprStatsReport-myAnnotation0] File "/mnt/csc-gc5/home/cerverat/cloud/anduril2/anduril/bundles/sequencing/components/GTFParser/GTFTools.py", line 25, in getitem [LOG exprStatsReport-myAnnotation0] raise KeyError("Line %d does not have an element %s.\nline:%s" %(self.lineNumber, k, self.gtfFile.lines[self.lineNumber])) [LOG exprStatsReport-myAnnotation0] KeyError: 'Line 0 does not have an element geneId.\nline:1\thavana\tgene\t11869\t14409\t.\t+\t.\tgene_id "ENSG00000223972"; gene_version "5"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene";\n' [ERROR exprStatsReport-myAnnotation0] ExprTableReport.scala:97, Component exprStatsReport-myAnnotation0: Component returned error status: 1 (Generic error)
Comments (4)
-
-
reporter ok, I changed some things: You already have hardcoded the biotype name so I don't see a reason to not hardcode the gene name and gene id things that come from the ensembl gtf which is supposedly static. Otherwise I would have to keep renaming my columns every time I have to run this function, since each quantifier has a different way of naming the columns. In the future we could consider using a parameter to define those instead of having them hardcoded.
-
reporter - changed status to resolved
It works now with the ensembl col names hardcoded and providing custom keyCol and nameCol from the expression file.
-
The hardcoded biotype was part of the ExpressionStats component, I didn't want to touch a working component at this stage not knowing which functions/pipelines used it. But it can be a new "enhancement" ticket if you want.
- Log in to comment
Update your clone of anduril2_case_study and check the code at stats.scala. It is already run and the results are at /mnt/storageBig7/OvCa_Anduril2_CaseStudy/Stats
The function ExprTableReport required the same column names for both expression table and annotation file: