pdbsplit() output filenames always using forced 4 char basename

Issue #124 resolved
Barry Grant created an issue

Just had an email bug report about pdbsplit(). Currently pdbsplit() writes pdb files with names that ONLY use the first 4 chars of the input filename. So if a user has multiple files with names where the first 4 chars are the same then overwriting and confusion follows. E.g.

# Make an example set of input files
> x <- get.pdb("1JFF")
trying URL 'http://www.rcsb.org/pdb/files/1JFF.pdb'
downloaded 586 Kb
> infile <- paste0(x,c("_ver1", "_ver2"), ".pdb")
> infile
[1] "./1JFF.pdb_ver1.pdb" "./1JFF.pdb_ver2.pdb"
>
> file.copy(x, infile)
[1] TRUE TRUE

# The expected output here should be files for ver1 and ver2 
> outfile <- pdbsplit(infile)
> outfile
[1] "split_chain/1JFF_A.pdb" "split_chain/1JFF_B.pdb" "split_chain/1JFF_A.pdb"
[4] "split_chain/1JFF_B.pdb"
## not no "_ver1_* or ver2 above...
> list.files(path="split_chain", patt="1JFF")
[1] "1JFF_A.pdb" "1JFF_B.pdb"

From a quick look there are a number of places in pdbsplit() that use splitstr(). I propse adding a mk4=FALSE input argument to pdbsplit and using basename.pdb() from model.R in replace of all these calls.

Let me know if there was/is a good reason for enforcing 4 char only basenames for output files here. Otherwise I will make the change.

Comments (3)

  1. Xinqiu Yao

    I guess the original reason to use first 4 letters is just we assume the file is from PDB database. I agree with the change using basename of pdb file (without extension, '.pdb').

  2. Log in to comment