ConvertDb genbank should test the number of sequences in input and output match
Add a check to ConvertDb genbank to verify the number of sequences in the output matches the number of sequences in the input. From a warning in docs/examples/genbank.rst:
It is recommended that the number of sequences in the output
sqn
file be verified against the number of sequences in the inputtab
ortsv
file. From the command line, this can be achieved viagrep -c iupacna *.sqn
. This step is not necessary if running tbl2asn outside ConvertDb. This step is not necessary if running tbl2asn outside ConvertDb.
Comments (3)
-
-
Wait… Unless this is referring specifically to the tbl2asn filters. Yes? (Ie, not what passes/fails ConvertDb-genbank, but what passes/fails tbl2asn from the ConvertDb-genbank output.)
If so, we should be able to check that. Not sure if there is a stout/sterr message from tbl2asn we can capture, but we can search through the .gbf file for a record out (same sort of thing as @Julian Zhou is doing with grepping the .sqn files).
-
My impression is that the console log, with the
--asn
flag on, only tells how many sequences passed the first step (ConvertDb
generatingfsa
andtbl
files), but not how many sequences came out oftbl2asn
. I say this because in my most recent submission,ConvertDb
's console log indicated that all sequences passed, but eventually I got a few to a dozen sequences missing (in a seemingly random fashion) for most of the finalsqn
files. I only noticed this because the GenBank curator noticed that my files between different re-submissions had different number of sequences. This problem went away when I stoppd using the--asn
flag and rantbl2asn
outsideConvertDb
. We are not sure if this is a platform-specific issue or what. @Hailong Meng is helping to test this on Windows/Linux [issue#163] (I had this problem on Mac).
- Log in to comment
This should already be in the console log.