Issue with core.find()
Hello,
I am trying to go through the PCA tutorial and I was hoping for some help. I have ran the following code where 'XXXX' is a list of all my PDB IDs.
ids <- c("XXXX")
raw.files <- get.pdb(ids)
files <- pdbsplit(raw.files, ids)
pdbs <- pdbaln(files)
pdbs$id <- substr(basename(pdbs$id),1,6)
#seqidentity(pdbs)
#rmsd(pdbs, fit=TRUE)
head(pdbs$id)
anno <- pdb.annotate(pdbs$id)
#head(anno[, c("resolution", "ligandId", "citation")])
core <- core.find(pdbs)
However, I get the following error on the core <- core.find(pubs) command: Error in rep(NA, len[1]) : invalid 'times' argument
I assume this is a problem with an NA being found post-alignment, but I am unsure of how to fix this as I don't want to get rid of NAs due to gaps if I don't have to. Is there a simple fix for this or do I need to be more selective with the PDBs used. As a note, there are some NMR structures included in the list - would those multi-state pdbs be causing this issue?
Thanks for any help you can provide!
Comments (6)
-
-
The core.find() function should work with gaps in your aligned structures. Can you tell us please what the dimensions of your 'pdbs$xyz' object is? E.g.
dim(pdbs$xyz) # Or just type 'pdbs$xyz'
Without providing us with a minimal reproducible example (e.g. a small set of your PDB ids that result in this error) it hard to see what could be wrong here beyond a dimension problem in your pdbs object. Thanks!
-
Adjust the stop.at argument down to below 11. do e.g.
core.find(pdbs, stop.at=5)
.Looking at your alignment it seems you have only 12 non-gap columns (open the
aln.fa
file generated bypdbaln()
to see (e.g with seaview)). You can also check which non-gap columns you have by using functiongap.inspect
:> gaps <- gap.inspect(pdbs$ali) > gaps$f.inds [1] 510 511 512 544 627 645 646 647 648 649 650 654
(12 positions).
You can certainly align all your PDB files on these 12 residues (or even the core identified), but doing a PCA here will omit all gap containing columns. which means that the PCA will be performed on 12 residues.
The cryptic error message you got certainly calls for some internal checking and relevant messaging in the
core.find
function. (prior to line 130 to be precise).while(length(res.still.in) > stop.at) {
Thanks ! -
Indeed the error message should be improved upon here - thanks for reporting!
On a a more fundamental level for you I think your current alignment is likely not what you want. Note that you have at least 10 very different protein sequence groups indicative of distinct proteins (i.e. different protein families) in there that come form your combining of all the chains of you your chosen PDB files. Examining your alignment with a viewer like seaview can help you here as well as examining their pairwise sequence identity values. E.g.
i <- seqidentity(pdbs) hc <- hclust(as.dist(1-i)) plot(hc, hang=-1)
You might want to focus on only the chains representative of the family you are most interested in here.
-
- changed status to resolved
Resolved but with a note for future core.find() error/warning message improvements
-
- changed version to v2.2
- Log in to comment
Hi, Thanks for reporting this. Could you save your pdbs object and send it to me via email? That would make it much easier to debug. You can do
and send the my_pdbs.RData file. (email: larsss [at] gmail dot com)