HTTPS SSH

Code for generating spatio-temporal proposals from videos. For more details, check out our paper.

There are two steps of the algorithm:

  • The clustering algorithm, the svx directory.
  • The proposal generation algorithm, the rp directory.

Dependencies

Installing the dependencies:

# Python packages for Ubuntu or Debian.
sudo apt-get install python-numpy python-scipy python-matplotlib
sudo apt-get install swig

# Python packages for Fedora.
sudo yum install numpy scipy python-matplotlib
sudo yum install swig

# Boost library for Ubuntu or Debian.
sudo apt-get install libboost-all-dev

# Boost library for Fedora.
sudo yum install boost-devel

# Structured edge detection and Piotr's toolbox.
cd svx
wget http://pascal.inrialpes.fr/data2/oneata/data/sed.zip
unzip sed.zip

# Large displacement optical flow.
wget http://lmb.informatik.uni-freiburg.de/resources/binaries/pami2010Linux64.zip
unzip pami2010Linux64.zip

To compile the SWIG code just use the following command (again from the svx directory):

make all

Getting started

This is a short tutorial on how to use the code. You can use the following data, the frames corresponding to the first video in the UCF Sports collection:

wget http://pascal.inrialpes.fr/data2/oneata/data/ucf_sports/001.zip
unzip 001.zip

Dataset class

First you need to define a class with the following methods:

  • get_images_path(video) returns the path to the video frames.
  • get_edges_path(video) returns the path to the edges (SED features).
  • get_flow_path(video, direction) returns the path to the flow (direction can be forward or backward).
  • get_segmentation_directory(video) returns the path to where the segmentation will be stored (as images).

Add an instance of the class into the DATASETS dictionary, in the datasets.py. For the UCF Sports dataset an example is already provided.

Extracting features for segmentation

The extract_features.py script generates a list of commands for extracting the features. You can either execute the commands one by one or launch them in parallel (for example, using GNU parallel):

python extract_features.py --video 001 -d ucf_sports -f edges flow-forward flow-backward | while read c; do eval $c; done
python extract_features.py --video 001 -d ucf_sports -f edges flow-forward flow-backward | parallel 'eval {}'

Hierarchical clustering

The supervoxels.py computes the segmentation of the video. For testing and debugging purposes, you can use only a subset of frames (by specifying the start and end arguments) and you can visualize various steps of the algorithm (by supplying arguments to the --viz option). Examples:

python supervoxels.py --video 001 -d ucf_sports --start 0 --end 5 -vv --viz mb edges
python supervoxels.py --video 001 -d ucf_sports

Randomized merging algorithm

The folder rp (standing for Random Prim) contains the scripts needed for generating spatio-temporal object proposals. Before generating the proposals, we first compute weights between super-voxels based on color, flow and geometric features (these features are explained in the paper). The script graph_weights.py computes distances between each pair of neighbouring super-voxels for each feature. We combine the distances for the eight different features using a learnt weight combination; you can use our learnt weight combination: download it from here and copy it into the data folder.

Here are examples of computing the weights between super-voxels and then generating the proposals:

python graph_weights.py --video 001 -d ucf_sports -l 100
python proposals.py --video 001 -d ucf_sports -l 100 -n 100

The proposals are stored as a three-dimensional matrix:

  • the first axis corresponds to the proposal;
  • the second axis corresponds to the frame number;
  • the third axis corresponds to the bounding box (hence it has four dimensions: the first two corresponding to the bottom corner and the last two corresponding to the top corner of the bounding box).

The proposals are by default stored in NumPy format, but they can be stored in MATLAB format as well, by specifying the option --format mat to the proposals.py script.

Visualizing. If you wish to visualize the generated proposals, you can use the show_proposals.py script as in the following example:

python show_proposals.py --video 001 -d ucf_sports -n 5 -p ../data/ucf_sports/proposals/001/proposals_level_100_features_color_flow_size_fill_size_static_fill_static_size_time_fill_time_no_temp_constraint_False.dat

Evaluation. If you wish to evaluate the quality of the proposals for a given video in terms of best average overalp (BAO) or correct localization (CorLoc20, CorLoc50), then you can use the evaluate.py script. The evaluation script needs to access the groundtruth tube. The parse_groundtruth method in the svx/datasets.py file loads the groundtruth: a bounding box for each frame that contains an annotation. More precisely, the function parse_groundtruth should return the following:

  • frame number;
  • x and y coordinates of the bottom left corner;
  • x and y coordinates of the top right corner.

Here is an example of how to call the evaluate.py script to compute the BAO for the first video (video 001) of the UCF Sports dataset:

python evaluate.py -d ucf_sports --video 001 -p ../data/ucf_sports/proposals/001/proposals_level_100_features_color_flow_size_fill_size_static_fill_static_size_time_fill_time_no_temp_constraint_False.dat -m mbao

You can get the original (groundtruth_old.zip) or our annotations (groundtruth_new.zip) as follows (make sure you are in the top level directory and not in the rp directory):

wget http://pascal.inrialpes.fr/data2/oneata/data/ucf_sports/groundtruth_new.zip
unzip groundtruth_new.zip