Co-Regularised Support Vector Regression

Co-regularised support vector regression (CoSVR) is a form of non-linear regression for data that is available in multiple views, i.e., one instance has multiple representations. For this kind of data, co-regularization is a method for incorporating information of unlabeled data. The method trains two support vector regression models, one for the labeled data of each view, in one optimization problem and in addition demands that both models make similar predictions on the unlabeled data. This latter part is called co-regularization.

An example of this kind of data are ligand affinity values for protein-ligand bonds. Each ligand can be represented by multiple fingerprints, each constituting a view. Since measuring the affinity value is a time-consuming and costly process, only few labeled training examples exist, whereas the number of potential ligands is huge.

This work has been published in Machine Learning and Knowledge Discovery in Databases, 2017. If you are using this for your scientific work, it would be great if you could cite

Co-Regularised Support Vector Regression. Katrin Ullrich, Michael Kamp, Thomas Gärtner, Martin Vogt, Stefan Wrobel. Machine Learning and Knowledge Discovery in Databases, Springer, 2017.

Additional material to the paper can be found here.

Ligand Affinity Prediction with CoSVR

We consider the problem of ligand affinity prediction as a regression task.

Protein-ligand bonds trigger the majority of biochemical reactions. Therefore, the characterisation of their strength is a crucial step in the process of drug discovery and design. However, the practical determination of ligand affinities (labelled examples) is very expensive, whereas unlabelled compounds are available in abundance. Additionally, different vectorial representations (molecular fingerprints) for compounds exist that cover characteristic sets of molecular features.

In this scenario, we propose to apply a co-regularisation approach, which includes information from unlabelled examples by ensuring that individual models trained on different molecular fingerprints make similar predictions. We extend support vector regression similarly to the existing co-regularised least squares regression (CoRLSR) and obtain co-regularised support vector regression (CoSVR). Different variants of CoSVR--including a single-view version--and their characteristics are proposed.

Content of this Repository

The repository contains a python project (including eclipse project files) for running CoSVR experiments on ligand affinity prediction tasks.

  • data: the data package contains a data handler class.
  • experiments: the experiments folder contains the exp.py files which setup and run an experiment.
  • framework: the framework package contains classes for running an experiment, performing parameter tuning, and evaluating the results.
  • learner: the learner package contains all CoSVR variants as well as several baselines, including co-regularised least squares regression (CoRLSR), standard SVR, and standard RLSR.
  • test: the test package contains some unit tests for the project.

How do I get set up?

  • Download the project.
  • Make sure you have all necessary python packages installed (numpy, cvxopt, sklearn).
  • Run an experiment (python exp.py).
  • In the folder, where the exp.py file lies, a new folder will be created where all results will be stored.
  • Enjoy!

Contribution guidelines

  • If you find some bugs or have other suggestions, please contact us!
  • If you want to contribute to the project, please let us know!
  • If you have questions on how to get something running, ask right away!

Who do I talk to?

  • Please contact the repository owner (Michael Kamp).