A tool for labelling the pose of objects in 3D monocular videos

Important note: The idea behind this software is that when labelling the poses of objects in monocular videos, the depth of objects is difficult to find, and that it is sufficient to test the depth estimation performance against that of a human. As such, although the labeller does involve you specifying the depth, the error measure of your tracker or motion estimator would be calculated in the 2D image. This means that 3D model-based tracking methods are evaluated in ways in which performance should be expected to be achievable. Also, you only need to specify the ground-truth as far as the human eye can discern it from a monocular video.

This argument is developed in sections 3.7 and 7.2 of the following thesis (please cite it if you use this software in an academic work):

Duff, D. J. (2011). Visual motion estimation and tracking of rigid bodies by physical simulation (PhD Thesis). University of Birmingham.


  • Specify the 3D pose of a mesh model in a monocular video for ground-truth.
  • Automatic interpolation between labelled frames.
  • Loading and saving of labelled tracks in CSV format.
  • Move through the video in a random access fashion, specifying poses.
  • Contains a module for loading calibration (.cal) files into Python.
  • Copy and paste poses between frames.

Image of program


See the "dependencies" section below before running.

To run this labelling program, you need:

  • A mesh model of the object that you will label the video for.

  • The video file that you will label.

  • An intrinsic parameter .cal file containing intrinsic calibration parameters for the camera used in the video.

  • An extrinsic parameter .cal file for the pose of the camera (the camera is assumed not to move...).

  • OPTIONAL: A CSV file containing labelled frames (of the type saved by this software).


Either run from the program's directory or add its directory to your python path (no is provided): [model_file] [video_file] [intrinsics_file] [extrinsics_file] [labelled_data]

See the shell file for a non-working example (video not supplied).

To translate, use the left button and drag to move in the image XY directions. Use the right mouse button and drag up/down to zoom.

To rotate, hold down the shift key and drag with the left button - a trackball will appear.

Using the control key modifier some other interpretations of mouse movement can be used (in particular, the control key and a drag with the left button will produce a translation in world xy coords).

The CSV file that it produces has output like this:

3D Monocularly Labelled Data Version 1

In order to edit a pose numerically you may create a CSV output file, edit that and then reload.


The folder sampledata contains a working bash script and all the necessary data (video, camera and pose calibration files, an object model, and a default labelled data) to use it. Try running:

cd sampledata

Other notes

May be buggy for videos more than 2000 frames long. If you have trouble, send me your data and I'll fix the bug.

If you alter the pose of an object in a frame, the status of that frame goes from "interpolated" to "labelled". Use the "Frame Delete Labelling" button to switch the frame back to interpolated.

You probably won't need the following as the basic rotation approach uses a rotation sphere: The rotation order is only important for the alternative (control+shift+mouse) rotations (this is the old rotation approach) and is specified like "sxyz" or "ryzy" where the "s" and "r" refer to either spatial ("s"tatic) or body ("r"otating) frames, and "xyz" is the order of axes of rotation (Euler). Internally everything is a quaternion, naturally...


numpy, pyqt4, pyopencv, PIL, pyassimp, pyopengl

To get these on Ubuntu 12.04 (and probably other versions):

sudo apt-get install python-opencv python-numpy python-qt4-gl python-imaging python-pyassimp python-opengl


Makes use of from ROS (bundled).

Based loosely on the PyQt4 port of the opengl/hellogl example from Qt v4.x.


Please do not use this software if you are good. If you are good, it will try and eat your soul. Only use this software if you are slightly unhinged. Do not blame me if this software tries to eat your soul.