Reading large ensembles

Issue #499 resolved
Former user created an issue

Hi, I really love your package, but studying large ensembles of structures appears very slow. In particular, bio3d requires to pre-processing pdbs with pdbaln, which invokes muscle. However, when I have a large ensemble of the same structure in different conformations, the sequence (and residue correspondance) is the same. Could you implement a way that allows skipping the structural alignment step if all PDBs have the same sequence?

Cheers

Comments (3)

  1. Xinqiu Yao

    Hi,

    pdbaln() intends to analyze heterogeneous sequences. If you have identical sequences but different conformations, you should not use pdbaln(). In particular, if your data were converted from simulation trajectories, we strongly recommend read the trajectory file directly with read.ncdf() or read.dcd(), instead of converting trajectory to pdb and read them with pdbaln().

    Let me know if it helps.

  2. Barry Grant

    Glad you like the package. If you don't need to do any alignment then the current pdbaln() is indeed not the right way to go.

    If your structures are all the same composition (and atom order) then you could just cat them together into one multimodel PDB file and read with read.pdb(). Using a trajectory format will be the quickest input method as Xinqiu states. Both will give you a coordinate matrix to work with.

    Let us know if these options prove tricky and we can perhaps add a shortcut option to pdbaln() to skip the alignment pre-processing step.

  3. Log in to comment