How to extract subset of protein structure (PDB format) file based on a subsequence of protein

Issue #823 resolved
Edward Wijaya created an issue

I looking at a particular protein structure called 2LY4 accessible from RSCB PDB website.
The corresponding fasta sequence for that structure is this:

>2LY4_1|Chain A|High mobility group protein B1|Homo sapiens (9606)
GKGDPKKPRGKMSSYAFFVQTCREEHKKKHPDASVNFSEFSKKCSERWKTMSAKEKGKFEDMAKADKARYEREMKTYIPPKGE
>2LY4_2|Chain B|Cellular tumor antigen p53|Homo sapiens (9606)
MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGPDEAPRMPEAAPPVAPAPAAPTPAAPAPAPSWPL

And the PDB format file can be downloaded here (1.7MB). The entire is to large to be pasted here.

What I want to do is to extract the subset of PDB format based on the subsequence in fasta above.
Namely Chain A starting from 1st residue to 30th residue

GKGDPKKPRGKMSSYAFFVQTCREEHKKKH

How can I do that with Bio3D?

Comments (2)

  1. Xinqiu Yao

    If you know the residue numbers of the matched sequence, it would be easy to do with the atom.select() funciton.

    For example, to create a subset containing residues 20-80:

    spdb ← atom.select(pdb, resno=c(20:80), value=TRUE)

  2. Log in to comment