1. NATS group @ U Hamburg
  2. Untitled project
  3. prosub



Projekt Semi-automatische Generierung von Untertiteln



  • first run pip install -r requirements.txt to fetch the necessary deps.
  • run ./gen-doc from root to generate documentation from docstrings. See results at doc/build/html/index.html.
  • run ./test from root to run code examples that you include into docstrings as unit tests. Read this article for motivation. See this article for further information.
  • Look at the preprocess and postprocess modules for examples of possible syntax.
  • For more rst-syntax you might look at doc/help/cheatsheet.txt.
  • It looks like this.


The resources live at /informatik/isr/nats/projects/subtitling/resources. Note: you have to run kinit when you login per Public Key Authentification to be able to access the folder.


Make sure you have the following installed globally on your machine:

  • python3
  • cython

Step 1: Create a virtual environment

It is highly recommended to install all the dependencies for this project in a virtual python environment. You may either use the helper virtualenvwrapper.sh (recommended), or manually work with virtualenv.

If you want to use virtualenvwrapper.sh on Ubuntu, you might want to follow this guide up until "Create a new virtualenv". Otherwise, you probably know how to install either virtualenv and virtualenvwrapper.sh on your system. Else, google.

Create a Python 3 virtual environment. Name it whatever you like, e.g. prosub:

mkvirtualenv -p $(which python3) prosub

Now, to work on your virtual environment, run

workon prosub

Option 2: Virtualenv

Install virtualenv globally, then run

virtualenv -p $(which python3) .env

To activate the virtual environment, run the following command (do this every time you want to work on the project):

source .env/bin/activate

Step 2: Install dependencies

  • Run pip install -r requirements.txt. This will install all python dependencies.
  • Install nltk_data: sudo python -m nltk.downloader -d /usr/local/share/nltk_data all
  • Install cython for Python3 (if not globally installed in your system already, as is currently only on Arch Linux by installing the package community/cython), by running pip install cython within your virtual environment.
  • Compile the turboparser: scripts/install-turboparser.sh. It will be installed into data/TurboParser*.
  • Download the data for the TurboParser: scp -r uni:/informatik/isr/nats/projects/subtitling/resources/syntax/ resources/

Step 3: Run

Make sure your Virtual Environment is active. Your should see something like (.env) or (prosub) in front of your normal prompt. If not, consult Step 1 from above.

Read CONFIG.md to understand how to configure the system. Run ./run.py --help to see command line flags. Good luck!


This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.