1. Yongtao Hu
  2. mm15-speaker-naming

Overview

HTTPS SSH
This contains the MATLAB source code for our MM'15 paper ***"Deep Multimodal Speaker Naming"***. **Project page**: http://herohuyongtao.github.io/publications/speaker-naming/. ## How to use Note: please set MATLAB's working folder to the base folder that contains this `README.text`. All the code mentioned in the following are under folder `applications/face-audio/`, you will need to **Add to Path** when running these code. ### Prepare face data: - Prepare train/test file list, e.g. `train-file-list.txt` and `test-file-list.txt`. Each row in the file in the following format: `full-path-of-img label`. - Run `gen_face_data.m`. After this, it will generate several `train_%d` and `test` mats, each of which contains `sample` (`H x W x 3 x N`) and `tag` (`numClass x N`). ### Prepare audio data: - Merge all audio clips per character acrossing all videos for both train/test: `merge_audio_file.m`. - Run `gen_audio_data.m` for both train/test. This will generate `audio_samples.mat` for both train/test, which contains `sample` (`75 x N`) and `tag` (`numClass x N`). ### Prepare face-audio test data: - Run `gen_face_audio_data.m`. ### Train/test face-alone model: - Train: `train_face_model.m`. - Test: `test_face_model.m`. ### Train/test face-audio model: - Train: `train_face_audio_model.m`. - Test: `test_face_audio_model.m`. ### Train/test face-audio-audio/svm model: - Train * Merge all face train submat into: `merge_face_submat_into_one.m`. * Prepare face-audio-audio train/test data: `gen_svm_face_audio_audio_train/test_data.m`. * Train: `train_svm_face_audio_audio.m`. - Test * Prepare test data: `gen_simulate_data.m` or `gen_simulate_data_voting_segment.m`. * Test: `test_face_audio_audio_model.m` or `test_face_audio_audio_model_2_models.m`. ## Hardware/software requirements 1. Matlab 2014b or later, CUDA 6.0 or later (currently tested in Windows 7). 2. A Nvidia GPU with 2GB GPU memory or above. 3. Third-party library: [MIRtoolbox v1.5](https://www.jyu.fi/hum/laitokset/musiikki/en/research/coe/materials/mirtoolbox) (for audio processing). ## Terms of use The source code is provided for research purposes only. Any commercial use is prohibited. When using the code in your research work, please cite the following paper: > **"Deep Multimodal Speaker Naming."** > Yongtao Hu, Jimmy SJ. Ren, Jingwen Dai, Chang Yuan, Li Xu, and Wenping Wang. > *ACMMM 2015*. ``` @inproceedings{hu2015deep, title={{Deep Multimodal Speaker Naming}}, author={Hu, Yongtao and Ren, Jimmy SJ. and Dai, Jingwen and Yuan, Chang and Xu, Li and Wang, Wenping}, booktitle={Proceedings of the 23rd Annual ACM International Conference on Multimedia}, pages={1107--1110}, year={2015}, organization={ACM} } ``` ## Contact If you find any bug or have any question about the code, please report to the [**Issues**](https://bitbucket.org/herohuyongtao/mm15-speaker-naming/issues) page or email to Yongtao Hu ([herohuyongtao@gmail.com](mailto:herohuyongtao@gmail.com)).