Clone wiki

Multimodal Pixel Annotator / Home


Multimodal Pixel Annotator is a tool for pixel-level annotation of masks in up to three different image modalities (RGB, thermal, depth).

The annotator is built in C++ with QT and OpenCV and may be compiled under Windows, macOS, and Linux. Pre-compiled binaries are available for these platforms in the downloads section. The binaries may not always reflect the latest features and bug-fixes, however.

The annotator is free to use under the MIT License.

For bounding-box level annotations, please visit our Bounding Box Annotator.

Getting started with sample dataset

If you do not have your own dataset, or just want to try the program on a small dataset, you can download a sample dataset here. The sample dataset contains both RGB and thermal images, as well as a don't care mask, a calibration file, and sample annotations of the vehicles in the scene.

  1. Download the application binaries for your platform or compile the application.
  2. Download the sample annotations
  3. Launch Pixel Annotator
  4. In the menu, go to 'File -> Settings'
    1. Make sure that 'Enable Thermal' is checked, and 'Enable Depth' is unchecked.
    2. In 'Registration', make sure that 'Use planar homography' is selected.
    3. Select the 'File patterns' tab, and make sure that the 'Image file patterns' are configured like this:
      1. RGB images | \*cam1\*.png
      2. Thermal images | \*cam2\*.png
  5. Go to 'File -> Open folder', and open the sampleAnnotations folder
  6. If prompted, select the 'mask.png' as the don't care mask
  7. Feel free to explore the program

Table of Contents

The Wiki contains information on:

Contribution guidelines

Please test the tool and see if it fits your purpose. If not, open an issue.


Please cite the following paper if you use the Multimodal Pixel Annotator for your research project:

Bahnsen, Chris H., et al. "The AAU Multimodal Annotation Toolboxes: Annotating Objects in Images and Videos" arXiv preprint arXiv:1809.03171 (2018).

  title={The AAU Multimodal Annotation Toolboxes: Annotating Objects in Images and Videos},
  author={Bahnsen, Chris H. and M{\o}gelmose, Andreas and Moeslund, Thomas B.},
  journal={arXiv preprint arXiv:1809.03171},

Who do I talk to?

  • Chris H. Bahnsen at