Clone wiki

Multimodal Pixel Annotator / Introduction guide

Introduction guide

This guide will walk you though how to set up the AAU VAP Multimodal Pixel Annotation tool, an explanation of the GUI and the different functionalities as well as a short example of the program in use.

If you do not have your own dataset, or just want to try the program on a small dataset, you can download a sample dataset here. The sample dataset consists of 3 RGB and thermal images, as well as a don't care mask for each modality, a calibration file, and sample annotation masks of the vehicles in the scene.

This guide is based on the version of the program from commit #54c822e, and uses the sample dataset which means only RGB and thermal videos will be used. Adding a depth video will however only change the layout of the working space.

Getting started

First of all, you should acquire the program. This can be done by downloading the binaries for your platform , or compiling the program yourself, as described here. The binaries may not always reflect the latest features and bug-fixes, however.

Start the program, where you should be met with an interface as shown below. Startpage2.PNG

Before the program can be used, you have to make sure the settings are correct, and the video(s) are correctly prepared.

The program takes the videos as a series of still frame images. The video therefore has to be decomposed into a series of .png files, with the name consisting of the frame number, and a suffix which identifies what camera it is from, such as "0001-cam1.png". In the sample dataset the name consists of the date, a camera identifier and frame number, such as "6-16-13-cam1-00134.png".

The program settings are adjusted by, clicking "File -> Settings". This will make the menu shown below pop up, consisting of 3 tabs: "General", "Annotations" and "File patterns"

The general tab:

  1. If you have more than one concurrent video, make sure that "Enable Thermal" and/or "Enable Depth" are checked in "Modalities" in the "General" tab.
    • The brightness and contrast of the "Thermal" video can be adjusted by clicking the "Adjust thermal image" button and changing the respective sliders.
    • When you have loaded the video, the current frame of the thermal video will be shown below and reflect the changes made to brightness and contrast
  2. It is possible to select between two different homographies between the modality videos:
    • Planar homography: The traditional and commonly used approach
    • Multiple homographies: An alternative approach using several homographies to associate scenes. Proposed by Palmero et al. (2016)
  3. If you have a single global mask indicating the region of interest, make sure "Use global don't care mask", is checked in "Don't care mask". If not the program will use separate masks for each video.
    • The color and opacity of the mask can be adjusted, using the respective GUI elements.
    • The mask should be a binary image with white pixels indicating the region of interest

The Annotations tab:

  1. If you want to save annotations from thermal and depth videos remember to check the "Save annotations for depth and thermal" box.
  2. If it is needed to have a border added to the annotations, then this can be done while drawing the annotation mask and/or when switching between annotations, by checking the respective boxes
    • The width of the automatically added border can be adjusted as needed.
  3. The initial annotation mask overlay color and the opacity can be adjusted.
  4. The maximum size of holes in the mask which can be automatically filled, and the maximum size of "noisy" BLOBs which can be automatically removed can be adjusted.

The File patterns tab:

  1. Make sure that the suffix of the different video(s) are designated. This is the suffix that all of the images should have. It is recommended to do as follows for clarity:
    • RGB images | *-cam1.png
    • Depth images | *-cam3.png (Only needed if “Enable Depth is checked)
    • Thermal images | *-cam2.png (Only needed if “Enable Thermal” is checked)
  2. Designate the names of the output annotation file as well as the folders which holds the annotation masks. It is recommended to do as follows for clarity:
    • Folder for RGB masks | rgbMasks
    • Folder for depth masks | depthMasks
    • Folder for thermal masks | thermalMasks


After adjusting the settings, the possible tags used during the annotation process has to be provided. In the menu, go to 'Annotations -> Edit suggested tags', and write the allowed tags, one tag per line. When 'Limit annotation tags to suggested list' is checked under the aforementioned settings, only these tags are allowed as 'categories' for the bounding boxes. For traffic annotations, commonly used categories are:

  • Pedestrian
  • Cyclist
  • Moped
  • Motorcycle
  • Car
  • Van
  • Lorry
  • Bus
  • Other vehicle

When adding an annotation you will be then be prompted with the following screen:


Each annotation mask may be accompanied by meta data. Before opening an annotation folder for the first time, set the meta data tags in 'File -> Edit meta data fields...'

The video(s) can now be loaded:

  1. To load the video(s) go to File -> Open folder' or press ‘CTRL + o'.
  2. Go to the directory with all the frames from the videos and press Enter

To make this phase quicker you can save your data "configuration", by clicking Save Configuration under the File tab, which saves a .yml file with the filepaths for the images, calibration file and don't care mask. When starting the program up at a later time, or when needing to switch video, this .yml file can be loaded by selecting Load configuration under the File tab, which loads the saved paths in one go.

You are now able to start annotation your data.

GUI elements

The GUI consists of 4 elements:

  1. The video window(s)
  2. The ‘Taskbar'
  3. The ‘Annotations' box
  4. The ‘Overlay color' box

Each of the two different boxes can be scaled within the main program window, as well as placed in different areas, both within and outside the main program window.

The video window(s)

The window(s) show the supplied video(s). Each window can be rescaled and moved as needed, but the area they can be within is limited by the two other GUI boxes.

The mask window show the current annotation mask for the ID(s) currently selected. The remaining windows show the selected modalities with the annotation mask and don't care mask overlaid. The number of windows are automatically adjusted by the number of modalities used.


Tools and shortcuts

The taskbar consists of several icons, as shown below, which represent some functionality related to adjusting the annotations. Most of the taskbar functionalities can also be activated through a keyboard shortcut.


# Action Description Shortcut
1 New annotation Starts a new annotation in the current frame CTRL + N
2 Save Saves the current annotations CTRL + S
3 Undo Undoes the actions taken within the current frame CTRL + Z
4 Delete Deletes the current selected annotation DEL
5 Add border to mask Adds a border to the currently selected annotation CTRL + R
6 Remove noise from mask Removes noisy BLOBs belonging to the currently selected annotation CTRL + B
7 Fill holes in mask Fill holes in the currently selected annotation CTRL + F
8 Select annotations Select annotations from the Annotations box Y
9 GrabCut Applies the GrabCut algorithm on the selected area CTRL + G
10 Add to GrabCut Enables a brush to add pixels to the GrabCut mask H
11 Subtract from GrabCut Enables a brush to subtract pixels to the GrabCut mask J
12 Add to mask Enables a brush to add pixels to the annotation mask B
13 Subtract from mask Enables a brush to subtract pixels to the annotation mask N
14 Add point to polygon Allows the user to an extra point to the polygon U
15 Subtract point from polygon Allows the user to remove a point from the polygon I
16 Move point in polygon Allows the user to move a point in the polygons O
17 Move mask up Move the annotation mask 1 pixel up W
18 Move mask down Move the annotation mask 1 pixel down S
19 Move mask left Move the annotation mask 1 pixel left A
20 Move mask right Move the annotation mask 1 pixel right D
21 Retain annotation boxes when loading previous Copy annotations when stepping to the previous frame (does not overwrite existing) F3
22 Retain annotation boxes when loading next Copy annotations when stepping to the next frame (does not overwrite existing) F4
23 Zoom in Zooms in on the image CTRL + +
24 Zoom out Zooms out from the image CTRL + -
25 Zoom fit Zooms so that the frame fits the window exactly CTRL + 9
26 Zoom 1:1 Zooms so that the frame is shown at its original scale CTRL + 0
27 Center on the current annotation(s) Centers the image window to the currently selected annotation(s). CTRL + 8
28 Previous image Steps 1 (or 5) frames backwards from the current frame and saves CTRL + Left Arrow
29 Next frame Steps 1 (or 5) frames forward from the current frame and saves CTRL + Right Arrow
30 Jump to image Jumps to the designated frame ID CTRL + J
31 Show don't care mask Show area outside of the region of interest, if it was defined.
32 Display tag Display the tag assigned to each annotation on the image. F1

Not all functionalities are shown on the taskbar. The remaining shortcuts are shown below.

Operation Shortcut
Open folder CTRL + O
Export to bounding box annotations CTRL + E
Quit the program CTRL + Q
Move the annotation mask 10 pixels up Shift + W
Move the annotation mask 10 pixels down Shift + S
Move the annotation mask 10 pixels left Shift + A
Move the annotation mask 10 pixels right Shift + D
Select previous point on polygon Home
Sepect next point on polygon End
Move a single point on the polygon up Alt + W
Move a single point on the polygon down Alt + S
Move a single point on the polygon left Alt + A
Move a single point on the polygon right Alt + D

Annotations box

The annotations box shows the current annotations in the frame designated by their ID in the top box. In the bottom box the information about the current selected annotation is shown. Each annotation will at a minimum have an ID plus a field for each meta data element which was added before importing the videos, if any.


Overlay color box

The overlay color allows one to control the color of the annotation mask if a single if a single annotation is selected, and if several are selected then it designates the initial color, from which the colors of the other annotations are then adapted.


Using the program

This section will go through a short example of how to use the program, and a few tips on what to do and not to do.

The goal with the annotations are to provide a ground truth for testing algorithms, and therefore the annotations should be as precise as possible. However, the annotations should also avoid any jittery or sporadic movements, but instead try and provide a smooth movement throughout the tracking of a target.

Setting up your workspace

In most cases when using several modalities, it is beneficial to only focus on one video at a time, and only use the other views if the first video does not provide clear enough footage. Therefore, it is recommended that only one window is active at a time.


Drawing the annotation

To draw a new annotation mask, press New annotation on the taskbar (Ctrl+N), while if you want to edit an existing annotation, select the corresponding ID in the Annotations window.

General workflow

An example of how the annotations are created is described in the following steps.

  1. Create a new annotation
  2. Set the tag (annotation properties window)
  3. Select the tool of choice (eg "GrabCut" or "Add point to polygon")
  4. Draw the annotation.
  5. Turn on "Retain annotation boxes when loading next" (F4)
  6. Step to next frame (Ctrl + Right)
  7. Adjust the annotation to fit the object (often only with WASD keys)
  8. Repeat from 6

As it can be seen from these steps that it is recommended to work on a single object at a time, that is to say create the annotations for all the frames in the image sequence for a single object and only then start working on a different object.

In some cases it can be hard to make out which pixels are a part of the target and which is a part of the background. In these cases it is recommended that you zoom into the area around the target and adjust the annotation. However, make sure to zoom out regularly and verify that the entire target is included and that the annotation is correct.

In case a Don't care border is desired, a border can be added to the annotation by selecting the Add border to mask function on the taskbar. The border can be visualized by ticking the Show border box. If you update the mask or adjust the border width, it is necessary to draw a new border by applying the Add border to mask function again.

The paint tools

Paint Tools

The method described in this section is useful for objects that have holes in their contours. For example a tube or a ring.

For description of each tool and keyboard shortcuts see the Tools and shortcuts section.

If you are drawing a new annotation start by using the GrabCut tool. Draw a box around the object you want to mask out, to get a rough initial draft.

This GrabCut mask can be corrected using the GrabCut brushes to select positive and negative pixels for the GrabCut algorithm, which subsequently updates the GrabCut mask.

The GrabCut mask only persist for the current frame, and will be converted to a standard mask when switching frame.

To correct existing masks it is recommended using the mask brushes, unless large changes have to be made, in which case the GrabCut tool may be preferable.

If the targets are moving very little, then it can be beneficial to tick the Keep annotations box, so that the masks are kept for the next frame. This masks then only have to be corrected and/or moved a slight amount. This is however not recommend for large movements, as there will be a minimal overlap, and therefore little to gain from this approach.

If while drawing there are holes in the mask or noisy pixels around the mask, then the Fill holes from mask and Remove noise from mask tools on the taskbar can be used for easily fixing this. Do however be careful to not accidentally reduce the quality of the mask when using this tool.

The polygon tools

Polygon Tools

Using polygons is recommended for objects that do not have holes in their contours or the holes can be included in the annotaion.

For description of each tool and keyboard shortcuts see the Tools and shortcuts section.

Polygon Editing

The polygon based workflow can increase the efficiency of the annotation process if the target object is rigid. A good example to demonstrate this would be a bus taking a turn. Let's say that the top of the bus is straight. In this case the top will be bounded by a straight line that only has two points in the polygon. When the bus takes a turn the user will have to adjust at least one of these two points (along with some other points of the polygon). Whereas the paint method would require the user to carefully follow the pixels on the top of the bus which generally is more work.

Both the keyboard and the mouse can be used to move the points on the polygon around. The mouse gives a way to quickly move points along long distances and the keyboard gives a pixel level precision. However the mouse can also be used to achieve very high precision given an adequate zoom. Deciding between the two is mostly a matter of personal preference.

When objects move into the frame over a period of multiple images it is usually easier to create the poylgon on an image where the object is completely in frame and step backwards from there. Note that it is allowed to have parts of a polygon be outside of the frame.

The Annotation file

The annotated masks are stored as .png files per modality, and a .csv file is stored to associate the masks to their specific IDs. The RGB masks are saved in three versions: With Don't care border, without the border and just the border. The Thermal and Depth modalities are only saved as the masks without the border. The annotation .csv file associates the masks with the correct IDs. This file should not be manually adjusted unless the user is 100% sure of what they are doing.


Metadata can be added to your annotations, if you have some specific information you want attached per frame such as if the target is occluded or not.

Metadata tags can be added by accessing the annotation .csv file, and adding a column in the file by typing e.g.: Occluded; in the top row, if using e.g. Notepad++ to edit the file. This adds a metadata tag called Occluded, which can be set by pressing " CTRL + F5". Additional metadata tags can be set by using "CTRL + F6-F12".

Meta data fields can also be added before starting annotating a new file, through the Edit meta data fields... options under the File tab.


Currently the program assigns an ID that is one higher than the largest ID currently in the scene. Therefore, if there is a lull in the dataset with no vehicles, the next vehicle entering will automatically be assigned the ID 1. If this is not desirable, than change the ID in the Annotations box, while having selected the specific annotation.

If you accidentally assigned an incorrect ID to a target, and want to change all ID's at the same time, you manually edit the annotation file.


It is possible to export the bounding boxes around the annotation masks in a format that is supported by the AAU Bounding Box Annotator. The annotations are exported by selecting Export to bounding box annotations under the File tab. The annotation are saved in a .csv file called convertedBBannotations. When exporting it is necessary to assign a default tag, which will be the tag of the bounding boxes for all annotations.

Tips and tricks

Some general tips and tricks to using the program:

  • Learn, use and get comfortable with the shortcuts. They will speed up your workflow
  • Use the "Keep annotations" box, but know when to just redraw the mask from scratch
  • Remember to use different brush sizes. Each have their place and time, and will speed up you workflow
  • The annotation should move with the target. If the target does not move, then the annotation should not either
  • Track one target through the video at a time. This way you concentrate on just that target, and won't get overwhelmed if there is several moving objects in the scene.
    • Also, it is useful to go through the entire tracking again after setting the last annotation, as to see if there are any jittery/sporadic movements or incorrect annotations.
  • Track the targets in the order they enter the scene. This way the same ID will not accidentally be assigned to two different targets, and it is easier to keep an overview of how far you are with your annotations.
  • Remember to use the noise removal and hole filling operations, but keep in mind that it might affect your annotation mask in a undesirable way. Try and adjust their settings so that you find them most useful.
  • If you are annotating several different videos, then it can be helpful to save the load configuration for each video, name it in a meaningful way, and keep it in the same folder. This will allow for quick and easy switching between videos.
  • Take your time! It is better to be annotating the data correctly at a slower pace, than to quickly annotate data incorrectly, which later on has to be corrected.

Furthermore, at AAU we use the following guidelines for annotation:

  • A target should be tracked until the entirety of the target is outside of the scene or the region of interest
  • If a target is partially outside the frame, then the annotation is fitted as well as possible, while remaining inside the frame
  • If a target is more than 80% occluded, then the metadata tag "Occluded" is set.
    • However if a target is fully occluded, an annotation is NOT set again, before it is visible again. When this happens, remember to set the ID of the new annotation in the Annotation properties box, so that it matches the ID before the target got occluded.
  • Each target is required to have an unique ID. This means that if there is a time period with no targets, you should manually assign the correct ID to the next target that enters.