get.pdb() hang with gzip=TRUE and ncore=X

Issue #49 resolved
Barry Grant created an issue

I have just observed the following potential bug when using both gzip and ncore options to speed up get.pdb().

# Find open and closed states for lysozome...
pdb <- read.pdb("1hel")

blast <- blast.pdb( pdbseq(pdb) )
hits <- plot(blast, cutoff=160)

##- N.B. Hangs after initial download for unknown reason and needs to be killed...
raw.files <- get.pdb(hits$, path ="lys_pdbs", gzip=TRUE, ncore=8)

##- However this works fine
raw.files <- get.pdb(hits$, path ="lys_pdbs", gzip=TRUE)

##- This also works fine...
raw.files <- get.pdb(hits$, path ="test", ncore=8)

## Also works fine
raw.files <- get.pdb(hits$[1:100], path ="test", gzip=TRUE, ncore=8)

## This exits with missing downloads...
raw.files <- get.pdb(hits$[201:500], path ="test", gzip=TRUE, ncore=8)
## Warning messages:
## 1: In get.pdb(hits$[201:500], path = "test", gzip = TRUE, ncore = 8) :
##  ids should be standard 4 character PDB-IDs: trying first 4 characters...
## 2: In mclapply(1:length(pdb.files), function(k) { :
##  scheduled cores 582 encountered errors in user code, all values of the jobs will be affected


##[1] 90 # lots of missing files...

  1. Xinqiu Yao

    I got similarly problem. But it is weird that the error happens quite randomly: Sometimes I can get through with all files downloaded but sometimes the job was done with missing files (70 in my case). I guess that is because of the response of PDB server (they may have some rules on multithread download), but I am not sure. Will check it with more details...

  2. Xinqiu Yao

    I think ncore=8 is still too large for PDB server. I tried ncore=4 and it seems work fine.

    system.time(raw.files <- get.pdb(hits$, path ="lys_pdbs", gzip=TRUE, ncore=4))
    #   user  system elapsed 
    # 1.560   3.383  28.186 
    # TRUE
    system.time(raw.files <- get.pdb(hits$, path ="lys_pdbs", gzip=TRUE))
    #   user  system elapsed 
    #  1.645   3.287  94.904 
    # TRUE

    So, the maximum ncore is set to 4. Let me know if there is still problem, and then we may think of removing the parallel part...

