3d Pose Grammar

This is the code for the paper

Hao-Shu Fang*, Yuanlu Xu*, Wenguan Wang, Xiaobai Liu and Song-Chun Zhu, Learning Pose Grammar to Encode Human Body Configuration for 3D Pose Estimation, In Proc. AAAI, 2018.

Our implementation is based on the code of 3d-pose-baseline, Julieta Martinez, Rayat Hossain, Javier Romero, James J. Little, A Simple yet Effective Baseline for 3d Human Pose Estimation, In Proc. ICCV, 2017.

In this work, we propose a deep network to encode 3d human pose grammar. mapping function. The proposed model consists of a base network which efficiently captures pose-aligned features and a hierarchy of Bi-directional RNNs (BRNN) on the top to explicitly incorporate a set of knowledge regarding human body configuration. In learning, we develop a pose sample simulator to augment training samples in virtual camera views, which further improves our model generalization. In experiment, we propose a new evaluation protocol #3 to validate model generalization capability.


We notice there are certain bug reports regarding to different versions of the dependencies. Please note we can not guarantee that our code will obtain consistent results across different versions.

Download H36M Dataset

Clone this repository, install required dependencies

Fetch the dataset provided by J. Martinez et al. Human3.6M dataset in 3d points, camera parameters to produce ground truth 2d detections, and 2d pose estimation results generated by Stacked Hourglass detector. Note we fine-tune this detector on Human3.6M dataset for better performance.

git clone
cd grammar_3dpose
cd data
cd ..


A 3D human body is represented as 16 joints, which are defined as

| Indexes | Joint Name     | 
| ------- | -------------- |
| 0       | Right Hip      |
| 1       | Right Knee     |
| 2       | Right Foot     |
| 3       | Left Hip       |
| 4       | Left Knee      |
| 5       | Left Foot      |
| 6       | Spine          |
| 7       | Thorax         |
| 8       | Neck/Nose      |
| 9       | Head           |
| 10      | Left Shoulder  |
| 11      | Left Elbow     |
| 12      | Left Wrist     |
| 13      | Right Shoulder |
| 14      | Right Elbow    |
| 15      | Right Wrist    |

We visualize the human body skeleton by drawing lines to connect their joints accordingly.


Train base network:

python --camera_frame --residual --batch_norm --dropout 0.5 --max_norm --evaluateActionWise --use_protocol3

Fine-tune grammar network:

python --camera_frame --residual --batch_norm --dropout 0.5 --max_norm --evaluateActionWise --load <batch_number> --add_grammar --learning_rate 1e-5 --use_protocol3

If you need to switch to protocol 1, please use the following commands:

python --camera_frame --residual --batch_norm --dropout 0.5 --max_norm --evaluateActionWise --use_det_only
python --camera_frame --residual --batch_norm --dropout 0.5 --max_norm --evaluateActionWise --use_det_only --load <batch_number> --add_grammar --learning_rate 1e-5 

To generate atomic pose and gmm for each atomic pose, run python src/

To run on your own data, first generate 2d pose detection results using Stacked Hourglass or RMPE, then link the .h5 output under data/h36m/, then train the base network and grammar network instructed in [1-2]

Pre-trained Models

We keep our pre-trained full model under directory pre-trained_models. If you need one, unzip the file and put it under experiments/All/. You could load the model and validate on the test set using the following commands

python --phase test --camera_frame --residual --batch_norm --dropout 0.5 --max_norm --evaluateActionWise --load 3180431 --add_grammar --use_protocol3

To visualize predicted results, please use the following commands

python --phase vis --camera_frame --residual --batch_norm --dropout 0.5 --max_norm --evaluateActionWise --load 3180431 --add_grammar --use_protocol3

Note you could also validate your own trained model or visualize your predicted results by changing the checkpoint file and loading batch number.


If you find our code useful, please check more details from our paper and cite our work


title={Learning Pose Grammar to Encode Human Body Configuration for 3D Pose Estimation},

author={Hao-Shu Fang* and Yuanlu Xu* and Wenguan Wang and Xiaobai Liu and Song-Chun Zhu},

booktitle={AAAI Conference on Artificial Intelligence},