get.pdb() hang with gzip=TRUE and ncore=X
Issue #49
resolved
I have just observed the following potential bug when using both gzip and ncore options to speed up get.pdb().
# Find open and closed states for lysozome...
library(bio3d)
pdb <- read.pdb("1hel")
blast <- blast.pdb( pdbseq(pdb) )
hits <- plot(blast, cutoff=160)
##- N.B. Hangs after initial download for unknown reason and needs to be killed...
raw.files <- get.pdb(hits$pdb.id, path ="lys_pdbs", gzip=TRUE, ncore=8)
##- However this works fine
unlink("lys_pdbs/*")
raw.files <- get.pdb(hits$pdb.id, path ="lys_pdbs", gzip=TRUE)
##- This also works fine...
unlink("test/*")
raw.files <- get.pdb(hits$pdb.id, path ="test", ncore=8)
## Also works fine
raw.files <- get.pdb(hits$pdb.id[1:100], path ="test", gzip=TRUE, ncore=8)
## This exits with missing downloads...
unlink("test/*")
raw.files <- get.pdb(hits$pdb.id[201:500], path ="test", gzip=TRUE, ncore=8)
## Warning messages:
## 1: In get.pdb(hits$pdb.id[201:500], path = "test", gzip = TRUE, ncore = 8) :
## ids should be standard 4 character PDB-IDs: trying first 4 characters...
## 2: In mclapply(1:length(pdb.files), function(k) { :
## scheduled cores 582 encountered errors in user code, all values of the jobs will be affected
all(file.exists(raw.files))
## FALSE
sum(!file.exists(raw.files))
##[1] 90 # lots of missing files...
Comments (3)
-
-
I think ncore=8 is still too large for PDB server. I tried ncore=4 and it seems work fine.
system.time(raw.files <- get.pdb(hits$pdb.id, path ="lys_pdbs", gzip=TRUE, ncore=4)) # user system elapsed # 1.560 3.383 28.186 all(file.exists(raw.files)) # TRUE unlink("lys_pdbs/*") system.time(raw.files <- get.pdb(hits$pdb.id, path ="lys_pdbs", gzip=TRUE)) # user system elapsed # 1.645 3.287 94.904 all(file.exists(raw.files)) # TRUE
So, the maximum ncore is set to 4. Let me know if there is still problem, and then we may think of removing the parallel part...
-
- changed status to resolved
Solved by reducing max core usage
- Log in to comment
I got similarly problem. But it is weird that the error happens quite randomly: Sometimes I can get through with all files downloaded but sometimes the job was done with missing files (70 in my case). I guess that is because of the response of PDB server (they may have some rules on multithread download), but I am not sure. Will check it with more details...