How to extract subset of protein structure (PDB format) file based on a subsequence of protein
Issue #823
resolved
I looking at a particular protein structure called 2LY4 accessible from RSCB PDB website.
The corresponding fasta sequence for that structure is this:
>2LY4_1|Chain A|High mobility group protein B1|Homo sapiens (9606)
GKGDPKKPRGKMSSYAFFVQTCREEHKKKHPDASVNFSEFSKKCSERWKTMSAKEKGKFEDMAKADKARYEREMKTYIPPKGE
>2LY4_2|Chain B|Cellular tumor antigen p53|Homo sapiens (9606)
MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGPDEAPRMPEAAPPVAPAPAAPTPAAPAPAPSWPL
And the PDB format file can be downloaded here (1.7MB). The entire is to large to be pasted here.
What I want to do is to extract the subset of PDB format based on the subsequence in fasta above.
Namely Chain A
starting from 1st residue to 30th residue
GKGDPKKPRGKMSSYAFFVQTCREEHKKKH
How can I do that with Bio3D?
Comments (2)
-
-
reporter - changed status to resolved
Thanks. This works.
- Log in to comment
If you know the residue numbers of the matched sequence, it would be easy to do with the
atom.select()
funciton.For example, to create a subset containing residues 20-80:
spdb ← atom.select(pdb, resno=c(20:80), value=TRUE)