B-CNN: Bilinear CNNs for fine-grained visual recognition

Created by Tsung-Yu Lin, Aruni RoyChowdhury and Subhransu Maji at UMass Amherst


This repository contains the code for reproducing the results in ICCV 2015 paper:

    Author = {Tsung-Yu Lin, Aruni RoyChowdhury, and Subhransu Maji},
    Title = {Bilinear CNNs for Fine-grained Visual Recognition},
    Booktitle = {International Conference on Computer Vision (ICCV)},
    Year = {2015}

The code is tested on Ubuntu 14.04 using NVIDIA Titan X GPU and MATLAB R2014b. Recently, we have upgraded the code to support dag implementation. Meanwhile, the implementation of bilinear pooling layers and our customized layers are wrapped into a separate bcnn-package.

Link to the project page.

Fine-grained classification results

Method Birds Birds + box Aircrafts Cars
B-CNN [M,M] 78.1% 80.4% 77.9% 86.5%
B-CNN [D,M] 84.1% 85.1% 83.9% 91.3%
B-CNN [D,D] 84.0% 84.8% 84.1% 90.6%


This code depends on VLFEAT and MatConvNet and bcnn-package. They are pre-defined as submodules for this project. To download the code, type:

>> git submodule init
>> git submodule update

Follow instructions on VLFEAT and MatConvNet project pages to install them first. Our code is built on MatConvNet version 1.0-beta18. To retrieve a particular version of MatConvNet using git, cd to MatConvNet folder and type:

>> git fetch --tags
>> git checkout tags/v1.0-beta18

Once these are installed edit the setup.m to run the corresponding setup scripts.

The implementation of the bilinear combination layer in symmetic and assymetic CNNs is included in the bcnn-package. This code contains scripts to fine-tune models and run experiments on several fine-grained recognition datasets. We also provide pre-trained models.

Pre-trained models

ImageNet LSVRC 2012 pre-trained models: We use vgg-m and vgg-verydeep-16 as our basic models. The format of pre-trained models on MatConvnet has evolved continuously. In this project, we use the models in version beta18. Please download the models from matconvnet pre-trained models.

Fine-tuned models: We provide three B-CNN fine-trained models ([M,M], [D,M], and [D,D]) and SVM models trained on respective bcnn features for each of CUB-200-2011, FGVC Aircraft and Cars dataset. Note that for [M,M] and [D,D], we run the symmetric model, where you can simply use the same network for both two streams. These can be downloaded individually here.

You can also download all the model files as a tar.gz here.

Fine-grained datasets

To run experiments download the datasets from various places and edit the model_setup.m file to point it to the location of each dataset. For instance, you can point to the birds dataset directory by setting opts.cubDir = 'data/cub'.

Classification demo

The script bird_demo takes an image and runs our pre-trained fine-grained bird classifier to predict the top five species and shows some examples images of the class with the highest score. If you haven't already done so, download our pre-trained B-CNN [D,M] and SVM models for this demo and locate them in data/models. In addition, download the CUB-200-2011 dataset to data/cub as well. You can follow our default setting or edit opts in the script to point it to the models and dataset. If you have GPU installed on your machine, set opts.useGpu=true to speedup the computation. You should see the following output when you run bird_demo():

>> bird_demo();
0.09s to load imdb.
1.63s to load models into memory.
Top 5 prediction for test_image.jpg:
3.80s to make predictions [GPU=0]

To run it on your own images run bird_demo('imgPath', 'favorite-bird.jpg');. Classification roughlly takes 4s per image on my laptop on a CPU. On an NVIDIA K40 GPU with bigger batch sizes you should roughly get a throughput of 8 images/second with the B-CNN [D,M] model.

Using B-CNN models

run_experments.m extracts B-CNN features and trains a svm classifier on fine-grained categories. Following shows how to setup B-CNN models:

  1. Symmetric B-CNN: extracts the self outer-product of features at 'layera'.

    bcnn.opts = {..
       'type', 'bcnn', ...
       'modela', PRETRAINMODEL, ...
       'layera', 14,...
       'modelb', [], ...
       'layerb', [],...
    } ;
  2. Cross layer B-CNN: extracts the outer-product between features at 'layera' and 'layerb' using the same CNN.

    bcnn.opts = {..
       'type', 'bcnn', ...
       'modela', PRETRAINMODEL, ...
       'layera', 14,...
       'modelb', [], ...
       'layerb', 12,...
    } ;
  3. Asymmetric B-CNN: extracts the outer-product between features from CNN 'modela' at 'layera' and CNN 'modelb' at 'layerb'.

    bcnn.opts = {..
       'type', 'bcnn', ...
       'modela', PRETRAINMODEL_A, ...
       'layera', 30,...
       'modelb', PRETRAINMODEL_B, ...
       'layerb', 14,...
    } ;
  4. Fine-tuned B-CNN: If you fine-tune a B-CNN network (see next section), you can evaluate the model using:

    bcnn.opts = {..
       'type', 'bcnn', ...
       'modela', FINE-TUNED_MODEL, ...
       'layera', [],...
       'modelb', [], ...
       'layerb', [],...
    } ;

Fine-tuning B-CNN models

See run_experiments_bcnn_train.m for fine-tuning a B-CNN model. Note that this code caches all the intermediate results during fine-tuning which takes about 200GB disk space.

Here are the steps to fine-tuning a B-CNN [M,M] model on the CUB dataset:

  1. Download CUB-200-2011 dataset (see link above)
  2. Edit opts.cubDir=CUBROOT in model_setup.m, CUBROOT is the location of CUB dataset.
  3. Download imagenet-vgg-m model (see link above)
  4. Set the path of the model in run_experiments_bcnn_train.m. For example, set PRETRAINMODEL='data/model/imagenet-vgg-m.mat', to use the Oxford's VGG-M model trained on ImageNet LSVRC 2012 dataset. You also have to set the bcnnmm.opts to:

    bcnnmm.opts = {..
       'type', 'bcnn', ...
       'modela', PRETRAINMODEL, ...
       'layera', 14,...
       'modelb', PRETRAINMODEL, ...
       'layerb', 14,...
       'shareWeight', true,...
    } ;

    The option shareWeight=true implies that the blinear model uses the same CNN to extract both features resulting in a symmetric model. For assymetric models set shareWeight=false. Note that this roughly doubles the GPU memory requirement. The cnn_train() provided from MatConvNet requires the setup of validation set. You need to prepare a validation set for the datasets without pre-defined validation set.

  5. Once the fine-tuning is complete, you can train a linear SVM on the extracted features to evaluate the model. See run_experiments.m for training/testing using SVMs. You can simply set the MODELPATH to the location of the fine-tuned model by setting MODELPATH='data/ft-models/bcnn-cub-mm.mat' and the bcnnmm.opts to:

    bcnnmm.opts = {..
       'type', 'bcnn', ...
       'modela', MODELPATH, ...
       'layera', [],...
       'modelb', [], ...
       'layerb', [],...
    } ;
  6. And type >> run_experiments() on the MATLAB command line. The results with be saved in the opts.resultPath.

Implementation details

The asymmetric B-CNN model is implemented using two networks whose feature outputs are bilinearly combined followed by normalization and softmax loss layers. The network is constructed using DagNN structure. You can find the details in initializeNetworksTwoStreams() and bcnn_train_dag().

When the same network is used to extract both features, the symmetric B-CNN model is implemented as a single network architecture consisting of bilinearpool, sqrt, and l2norm layers on the top of convolutional layers. This implementation is about twice as fast and memory efficient than asymmetric implementaion. You can find the details in initializeNetworkSharedWeights() and bcnn_train_simplenn().

The code for B-CNN is implemented in the following MATLAB functions:

  1. vl_bilinearnn(): This extends vl_simplenn() of the MatConvNet library to include the bilinear layers.
  2. vl_nnbilinearpool(): Bilinear feature pooling with outer product with itself.
  3. vl_nnbilinearclpool(): Bilinear feature pooling with outer product of two different features. Current version only supports the same resolution of two feature outputs.
  4. vl_nnsqrt(): Signed square-root normalization.
  5. vl_nnl2norm(): L2 normalization.

Running B-CNN on other datasets

The code can be used for other classification datasets as well. You have to implement the corresponding >> imdb = <dataset-name>_get_database() function that returns the imdb structure in the right format. Take a look at the cub_get_database.m file as an example.


We thank MatConvNet and VLFEAT teams for creating and maintaining these excellent packages.