Introduction guide

Introduction guide

This guide will walk you though how to set up the AAU VAP Bounding Box Annotation tool, an explanation of the GUI and the different functionalities as well as a short example of the program in use.

If you do not have your own dataset, or just want to try the program on a small dataset, you can download a sample dataset here. The sample dataset consists of 30 RGB and thermal images, as well as a don't care mask, a calibration file, and sample annotations of the vehicles in the scene.

This guide is based on the version of the program from commit #b073202.

Getting started

First of all, you should acquire the program. This can be done by downloading the binaries for your platform , or compiling the program yourself, as described here. The binaries may not always reflect the latest features and bug-fixes, however.

Start the program, where you should be met with an interface as shown below.

Before the program can be used, you have to make sure the settings are correct, and the video(s) are correctly prepared.

The program takes the videos as a series of still frame images. The video therefore have to be decomposed into a series of .png files, with the name consisting of the frame number, and a suffix which identifies what camera it is from, such as "0001-cam1.png".

The program settings are adjusted by:

In the menu (top left corner), go to ‘’Annotations -> Settings’’
- This will make the menu shown below pop up
If you have two concurrent videos, make sure that “Enable Thermal” is checked in ‘’Modalities’’.
- The brightness and contrast of the ‘’Thermal’’ video can be adjusted using the respective sliders
- When you have loaded the video, the current frame of the thermal video will be shown below and reflect the changes made to brightness and contrast
In “Annotation properties”, the allowed tags available can be restricted by checking “Limit annotation tags to suggested list”.
In 'File patterns', make sure that the suffix of the different video(s) are designated. This is the suffix that all of the images should have. It is recommended to do as follows for clarity:
- RGB images | *-cam1.png
- Thermal images | *-cam2.png (Only needed if “Enable Thermal” is checked)
If you have a mask indicating the region of interest, make sure ‘’Use don’t care mask’’, is checked in ‘’Mask’’
- The color and opacity of the mask can be adjusted, using the respective GUI elements.
- The mask should be a binary image with white pixels indicating the region of interest
You can choose whether the last annotation of an object should be shown in red or green like any other annotation
Lastly, under ‘Saving’ designate the name of the .csv file where the annotation data is stored. It is recommended to call it ‘’annotations.csv’’

After adjusting the settings, the possible tags used during the annotation process has to be provided. In the menu, go to 'Annotations -> Edit suggested tags', and write the allowed tags, one tag per line. When 'Limit annotation tags to suggested list' is checked under the aforementioned settings, only these tags are allowed as 'categories' for the bounding boxes. For traffic annotations, commonly used categories are:

Pedestrian
Cyclist
Moped
Motorcycle
Car
Van
Lorry
Bus
Other vehicle

The video(s) can now be loaded:

To load the video(s) go to ‘Annotations -> Open folder’ or press ‘CTRL + o’.
Go to the directory with all the frames from the videos and press Enter
If ‘Enable Thermal’ is checked you will be prompted to provide a calibration file between the two cameras, called ‘calibVars.yml’
If ‘Use don’t care mask’ is checked, you will be prompted to provide a mask file called “mask.png”, containing the designated region of interest.

To make this phase quicker you can save your data "configuration", by clicking Save Configuration under the Annotations tab, which saves a .yml file with the filepaths for the images, calibration file and don't care mask. When starting the program up at a later time, or when needing to switch video, this .yml file can be loaded by selecting Load configuration under the Annotations tab, which loads the saved paths in one go.

You are now able to start annotation your data.

GUI elements

The GUI consists of 6 elements:

The video window(s)
The ‘Taskbar’
The ‘Annotation properties’ box
The ‘Current annotations’ box
The 'Annotation transparency' box
The ‘Annotation history’ box

Each of the four different boxes can be scaled within the main program window, as well as placed in different areas, both within and outside the main program window.

###The video window(s) The window(s) show the supplied video(s). Each window can be rescaled and moved as needed, but the area they can be within is limited by the three other GUI boxes.

###The taskbar The taskbar consists of several icons, as shown below, which represent some functionality related to adjusting the annotations. Most of the functionalities can also be activated through some shortcut.

#	Action	Description	Shortcut	Modified shortcut
1	Save	Saves the current annotations	CTRL + s
2	Undo	Undoes the actions taken within the current frame	CTRL + z
3	Delete	Deletes the current selected annotation	DEL
4	Delete future	Deletes the current selected annotation in the current frame and all future adjacent frames	Shift + DEL
5	Merge annotations	Merge the current selected annotation with another annotation. Affects the current and all future adjacent frames	Shift + Ins
6	Move up	Moves the selected annotation 1 pixel (or 10 pixels) up	w	Shift + w
7	Move down	Moves the selected annotation 1 pixel (or 10 pixels) down	s	Shfit + s
8	Move left	Moves the selected annotation 1 pixel (or 10 pixels) to the left	a	Shift + a
9	Move right	Moves the selected annotation 1 pixel (or 10 pixels) to the right	d	Shift + d
10	Expand horizontally	Expands the annotation 1 pixel (or 10 pixels) horizontally	l	Shift + l
11	Shrink horizontally	Shrinks the annotation 1 pixel (or 10 pixels) horizontally	j	Shift + j
12	Shrink vertically	Shrinks the annotation 1 pixel (or 10 pixels) vertically	k	Shift + i
13	Expand vertically	Expands the annotation 1 pixel (or 10 pixels) vertically	i	Shift + i
14	Retain previous	Retains the annotations from the current frame, when loading the previous frame	F3
15	Retain next	Retains the annotations from the current frame, when loading the next frame	F4
16	Interpolate	Interpolates between annotations when stepping more than 1 frames at a time	CTRL + i
17	Track object	Tracks the annotations in the frame
18	Reset tracking	Resets the tracking
19	Zoom in	Zooms in on the image	CTRL + +
20	Zoom out	Zooms out from the image	CTRL + -
21	Zoom fit	Zooms so that the frame fits the window exactly	CTRL + 9
22	Zoom 1:1	Zooms so that the frame is shown at its original scale	CTRL + 0
23	Previous frame	Steps 1 (or 5) frames backwards from the current frame and saves	Left Arrow	Shift + Left Arrow
24	Next frame	Steps 1 (or 5) frames forward from the current frame and saves	Right Arrow / Enter	Shift + Right Arrow
25	Jump to image	Jumps to the designated frame ID	CTRL + j
26	Play/Pause playback	Plays or pauses the playback of the video	F5
27	Show don't care mask	Toggles the "Don't care mask"
28	Display tag	Toggles whether the tag of each annotation should be shown	F1

Not all functionalities are shown on the taskbar. The remaining shortcuts are shown below.

Operation	Shortcut
Select next annotation	e
Select previous annotation	q
Set tag	F2
Close video window in focus	CTRL + F4
Toggle meta data field 1-8	CTRL + F5 - F12
Open folder	CTRL + o

###Annotation properties box The annotation properties box shows the information about the current selected annotation. Each annotation will at a minimum have an ID, a tag and a status flag, indicating whether the current annotation is active or the last frame in a sequence.

Furthermore, if any metadata is used, then the state of the each metadata flag is shown.

An example is shown below, where the different properties of the selected annotation is shown, as well as a metadata field called “Occluded”, and its state.

###Current annotations box The current annotations box shows which annotations are present in the current frame, with their ID and tag. The currently selected annotation is shaded slightly grey. Furthermore, the annotations can be deleted by selecting an annotation and pressing ‘’DEL’’.

###Annotation transparency box The annotation transparency box controls the transparency of all the non-selected annotation box. That is how visible are all of the annotation boxes which are not currently in focus.

###Annotation history box The annotation history box shows how the annotation was placed in the five previous frames, the annotation in the current frame (with a line on each side), as well as the next five frames, in a zoomed in view focusing only on the area around the annotation. The annotation history box can, like the two other boxes, be moved around within the main program window, as well as moved outside of the main program window. The images in the box are aligned so that they appear along the dominant axis of the box.

Using the program

This section will go through a short example of how to use the program, and a few tips on what to do and not to do.

The goal with the annotations are to provide a ground truth for testing algorithms, and therefore the annotations should be as precise as possible. However, the annotations should also avoid any jittery or sporadic movements, but instead try and provide a smooth movement throughout the tracking of a target.

Setting up your workspace

In most cases when using two videos, it is beneficial to only focus on one video at a time, and only use the other if the first video does not provide clear enough footage. Therefore, it is recommended that only one window is active at a time. Furthermore, to be able to view the annotation history clear enough, it is recommended that you either move it outside the main program window (e.g. onto a second screen), or attaches it vertically on the left (as shown below), as to allow the video window to be as large as possible (e.g. by using the “Zoom fit” functionality), while still having a clear view of the annotation history box. This is done by clicking on the shaded bar saying "Annotation history" and dragging it onto the left part of the main program window. It should then "pop" into place. The setup is retained when starting the program up at a later time.

Placing the annotation

To draw the initial bounding box for the annotation, click and hold the left mouse button and drag the bounding box around the target so that it fits reasonably well. Release the mouse button and select a tag for the annotation in the window prompt. If it is difficulty to determine the category of the target, you can change the tag of an annotation by pressing ‘’F2’’ and selecting a new tag. This changes the tag in all previous and future frames until the annotation is no longer present. This may take some time if the tag has to be adjusted in a large amount of frames.

After drawing the initial annotation, move it so that the top left corner is placed correctly i.e. so that the top and left side of the annotation is placed as close to the target as possible, by using the w, a, s and d keys. It is nearly always preferable if the annotation is as close a fit as possible. After positioning the top left corner, adjust the sides of the annotation by either expanding or shrinking the annotation, until the entire target fits neatly within the annotation, by using the i, j, k and l keys. This workflow is illustrated below.

To avoid having to use the mouse all the time, it is possible to have the program retain all the annotations in the frame, when moving to either the previous or next frame, by pressing F3 or F4, respectively. This way it is only necessary to make slight adjustments to the previous annotation, and not redraw it all the time, and the entire process can be done using mostly just a keyboard.

If the target is moving in a predictable way, e.g. with a constant velocity, it can be useful to turn on one of the retain annotation options as well as the “Interpolate” option. This allows you to skip more than one frame ahead, adjust the annotation, and then the program automatically interpolates between the last set annotation and the current annotation. However, when doing this it is still necessary to go through the interpolated annotations and correct any slight errors that may occur due to small accelerations or change in paths, as shown below.

In some cases it can be hard to make out which pixels are a part of the target and which is a part of the background. In these cases it is recommended that you zoom into the area around the target and adjust the annotation. However, make sure to zoom out regularly and verify that the entire target is included and that the annotation is correct.

This is repeated until the target is no longer in the frame or the region of interest. In the last frame where the target is present, it is recommended that you change the annotation status from Active to Last frame reached, indicating that the tracking sequence is over and makes sure the annotation will not be retained when going to the next frame. This is done by double-clicking the Status field in the Annotation properties box.

The annotation can have three different colors, depending on its status. It is Green if the annotation is Active, Red if the status is Last frame reached (and selected under the Settings), and, independent of its status, the annotation is Yellow if it is the annotation currently in focus.

The Annotation file

Caution should be taken when editing the annotation file. Only edit it after closing the program, as it will otherwise undo all your changes. In case you need to roll back your annotation file, the program creates a backup of the annotation file on startup. The backups are saved in a folder called Backup, in the folder where the images are stored.

Metadata

Metadata can be added to your annotations, if you have some specific information you want attached per frame such as if the target is occluded or not.

Metadata tags can be added by accessing the annotation .csv file, and adding a column in the file by typing e.g.: Occluded; in the top row, if using e.g. Notepad++ to edit the file. This adds a metadata tag called Occluded, which can be set by pressing " CTRL + F5". Additional metadata tags can be set by using "CTRL + F6-F12".

Meta data can also be added through the Edit meta data fields under the Annotations tab, but only before starting annotation a new video.

IDs

Currently the program assigns an ID according to a internal ID counter. Therefore, the program will always set the ID to one above the previously set value. If this is not desirable, then change the ID in the Annotation properties box.

If you accidentally assigned an incorrect ID to a target, and want to change all ID's at the same time, you manually edit the annotation file or press "F2" and select the new category (This however only affects any adjacent frames, so if there is a gap then only some will be changed).

Export

It is possible to export the annotations of the video so that it follows a specific file format and object categories. This is done by clicking the Annotations tab, and selecting Export annotations. This will open a window where you can select file format and object categories, and a Start conversion button. The exported annotations will be saved in a folder in the folder with your images, called Export.

Currently only the Darknet/YOLO file format and the MSCOCO and VOC object categories are supported by default. It is however possible to add custom object categories by placing a .csv file in the subfolder categoryLists, which can be found in the folder with the bounding box executable.

The object categories map the annotation tags to an integer, referred to as the class number, such as the tag "car" is mapped to integer "3" by the MSCOCO object categories. If an annotation's tag is not found in the selected object category style, then it will be labeled as -1 and a pop up will be shown.

The YOLO file format saves the annotations into a .txt file per image in the video, and one annotation per line. Each annotation is represented by the center, width and height of the bounding box, all as ratios of the total image. The annotation is therefore structured as follows: class_number box_center_x box_center_y box_width box_height

Tips and tricks

Some general tips and tricks to using the program:

Learn, use and get comfortable with the shortcuts. They will speed up your workflow
Use the "Retain annotation" functionalities, so you don't have to redraw the annotation for each frame
Use the "Interpolate" functionality when possible, but remember to check and correct the interpolated annotations
The annotation should move with the target. If the target does not move, then the annotation should not either
Remember to change the status to Last frame reached when you are done tracking a target. This way you won't get a lot of extra annotations when moving through the video.
Track one target through the video at a time. This way you concentrate on just that target, and won't get overwhelmed if there is several moving objects in the scene.
- Also, it is useful to go through the entire tracking again after setting the last annotation, as to see if there are any jittery/sporadic movements or incorrect annotations. For this you can use the playback functionality
- If the tracked objects are grouped together, it can be beneficial to turn down the transparency of the non-selected annotation boxes. Just remember to turn it back up again, so you don't accidentally forget a target.
Track the targets in the order they enter the scene. This way the same ID will not accidentally be assigned to two different targets, and it is easier to keep an overview of how far you are with your annotations.
If it is hard to see the target, then use the zoomed in view in the Annotation history box, or the "Zoom in" functionality
If you are annotating several different videos, then it can be helpful to save the load configuration for each video, name it in a meaningful way, and keep it in the same folder. This will allow for quick and easy switching between videos.
Take your time! It is better to be annotating the data correctly at a slower pace, than to quickly annotate data incorrectly, which later on has to be corrected.

Furthermore, at AAU we use the following guidelines for annotation:

A target should be tracked until the entirety of the target is outside of the scene or the region of interest
If a target is partially outside the frame, then the annotation is fitted as well as possible, while remaining inside the frame
If a target is more than 80% occluded, then the metadata tag "Occluded" is set.
- However if a target is fully occluded, an annotation is NOT set again, before it is visible again. When this happens, remember to set the ID of the new annotation in the Annotation properties box, so that it matches the ID before the target got occluded.
Each target is required to have an unique ID. This means that if there is a time period with no targets, you should manually assign the correct ID to the next target that enters.

Wiki

Bounding Box Annotator / Introduction guide