Issue with core.find()

Issue #219 resolved
Former user created an issue

Hello,

I am trying to go through the PCA tutorial and I was hoping for some help. I have ran the following code where 'XXXX' is a list of all my PDB IDs.

ids <- c("XXXX")
raw.files <- get.pdb(ids)
files <- pdbsplit(raw.files, ids)
pdbs <- pdbaln(files)
pdbs$id <- substr(basename(pdbs$id),1,6)
#seqidentity(pdbs)
#rmsd(pdbs, fit=TRUE)
head(pdbs$id)
anno <- pdb.annotate(pdbs$id)
#head(anno[, c("resolution", "ligandId", "citation")])
core <- core.find(pdbs)

However, I get the following error on the core <- core.find(pubs) command: Error in rep(NA, len[1]) : invalid 'times' argument

I assume this is a problem with an NA being found post-alignment, but I am unsure of how to fix this as I don't want to get rid of NAs due to gaps if I don't have to. Is there a simple fix for this or do I need to be more selective with the PDBs used. As a note, there are some NMR structures included in the list - would those multi-state pdbs be causing this issue?

Thanks for any help you can provide!

Comments (6)

  1. Lars Skjærven

    Hi, Thanks for reporting this. Could you save your pdbs object and send it to me via email? That would make it much easier to debug. You can do

    save(pdbs, file="my_pdbs.RData")
    

    and send the my_pdbs.RData file. (email: larsss [at] gmail dot com)

  2. Barry Grant

    The core.find() function should work with gaps in your aligned structures. Can you tell us please what the dimensions of your 'pdbs$xyz' object is? E.g.

    dim(pdbs$xyz)
    # Or just type 'pdbs$xyz'
    

    Without providing us with a minimal reproducible example (e.g. a small set of your PDB ids that result in this error) it hard to see what could be wrong here beyond a dimension problem in your pdbs object. Thanks!

  3. Lars Skjærven

    Adjust the stop.at argument down to below 11. do e.g. core.find(pdbs, stop.at=5).

    Looking at your alignment it seems you have only 12 non-gap columns (open the aln.fa file generated by pdbaln() to see (e.g with seaview)). You can also check which non-gap columns you have by using function gap.inspect:

    > gaps <- gap.inspect(pdbs$ali)
    > gaps$f.inds
     [1] 510 511 512 544 627 645 646 647 648 649 650 654
    

    (12 positions).

    You can certainly align all your PDB files on these 12 residues (or even the core identified), but doing a PCA here will omit all gap containing columns. which means that the PCA will be performed on 12 residues.

    The cryptic error message you got certainly calls for some internal checking and relevant messaging in the core.find function. (prior to line 130 to be precise). while(length(res.still.in) > stop.at) { Thanks !

  4. Barry Grant

    Indeed the error message should be improved upon here - thanks for reporting!

    On a a more fundamental level for you I think your current alignment is likely not what you want. Note that you have at least 10 very different protein sequence groups indicative of distinct proteins (i.e. different protein families) in there that come form your combining of all the chains of you your chosen PDB files. Examining your alignment with a viewer like seaview can help you here as well as examining their pairwise sequence identity values. E.g.

    i <- seqidentity(pdbs)
    hc <- hclust(as.dist(1-i))
    plot(hc, hang=-1)
    

    You might want to focus on only the chains representative of the family you are most interested in here.

  5. Log in to comment