Snippets

Lilian Besson (Naereen) Python 2/3 script to test kernels, for the Kernel Methods homework.

Created by Lilian Besson (Naereen) last modified
Script to test if functions are kernels, by Lilian Besson (C) 2016
MIT Licensed (http://lbesson.mit-license.org)


Using N = 1000, T = 5.
For test domains:
  - IR  ~= [-300.0, 300.0],
  - IR+ ~= [ 0, 300.0],
  - IN  ~= [ 0, 1000],
  - IN  ~= [ 0, 30] (small to reduce overflow),
Tolerance for small real values = 1e-12.
Tolerance for success rates     = 0.01.
And sampling the a_i from uniform in [-50.0, 50.0].

Starting to test 14 kernels...

- Testing the kernel abs_k  K(x, y) = < x, y > = x * y :
    It has domain = REAL.
  Results of the N = 1000 tests:
    - Tested for positiveness : success rate 100.00%.
    - Tested for definiteness : success rate 100.00%.
    - Tested for   symmetry   : success rate 100.00%.
  Finished for the kernel abs_k...

- Testing the kernel one_by_k  K(x, y) = 1 / (1 - x * y) :
    It has domain = MONEONE.
  Results of the N = 1000 tests:
    - Tested for positiveness : success rate 100.00%.
    - Tested for definiteness : success rate 100.00%.
    - Tested for   symmetry   : success rate 100.00%.
  Finished for the kernel one_by_k...

- Testing the kernel power_two_dot_k  K(x, y) = 2 ** (x + y) :
    It has domain = SMALLINT.
  Results of the N = 1000 tests:
    - Tested for positiveness : success rate 100.00%.
    - Tested for definiteness : success rate 100.00%.
    - Tested for   symmetry   : success rate 100.00%.
  Finished for the kernel power_two_dot_k...

- Testing the kernel power_two_sum_k  K(x, y) = 2 ** (x * y) :
    It has domain = SMALLINT.
  Results of the N = 1000 tests:
    - Tested for positiveness : success rate 100.00%.
    - Tested for definiteness : success rate 100.00%.
    - Tested for   symmetry   : success rate 100.00%.
  Finished for the kernel power_two_sum_k...

- Testing the kernel log_k  K(x, y) = log(1 + x * y) :
    It has domain = REALPLUS.
  Results of the N = 1000 tests:
    - Tested for positiveness : success rate 96.50%.
    ===> I guess this kernel log_k  K(x, y) = log(1 + x * y)  is NOT a positive definite kernel !
    - Tested for definiteness : success rate 100.00%.
    - Tested for   symmetry   : success rate 100.00%.
  Finished for the kernel log_k...

- Testing the kernel exp_k  K(x, y) = exp(-(x-y)^2) :
    It has domain = REAL.
  Results of the N = 1000 tests:
    - Tested for positiveness : success rate 100.00%.
    - Tested for definiteness : success rate 100.00%.
    - Tested for   symmetry   : success rate 100.00%.
  Finished for the kernel exp_k...

- Testing the kernel cos_plus_k  K(x, y) = cos(x + y) :
    It has domain = REAL.
  Results of the N = 1000 tests:
    - Tested for positiveness : success rate 46.60%.
    ===> I guess this kernel cos_plus_k  K(x, y) = cos(x + y)  is NOT a positive definite kernel !
    - Tested for definiteness : success rate 100.00%.
    - Tested for   symmetry   : success rate 100.00%.
  Finished for the kernel cos_plus_k...

- Testing the kernel cos_minus_k  K(x, y) = cos(x - y) :
    It has domain = REAL.
  Results of the N = 1000 tests:
    - Tested for positiveness : success rate 100.00%.
    - Tested for definiteness : success rate 100.00%.
    - Tested for   symmetry   : success rate 100.00%.
  Finished for the kernel cos_minus_k...

- Testing the kernel min_k  K(x, y) = min(x, y) :
    It has domain = REALPLUS.
  Results of the N = 1000 tests:
    - Tested for positiveness : success rate 100.00%.
    - Tested for definiteness : success rate 100.00%.
    - Tested for   symmetry   : success rate 100.00%.
  Finished for the kernel min_k...

- Testing the kernel max_k  K(x, y) = max(x, y) :
    It has domain = REALPLUS.
  Results of the N = 1000 tests:
    - Tested for positiveness : success rate 78.60%.
    ===> I guess this kernel max_k  K(x, y) = max(x, y)  is NOT a positive definite kernel !
    - Tested for definiteness : success rate 100.00%.
    - Tested for   symmetry   : success rate 100.00%.
  Finished for the kernel max_k...

- Testing the kernel min_by_max_k  K(x, y) = min(x, y) / max(x, y) :
    It has domain = REALPLUS.
  Results of the N = 1000 tests:
    - Tested for positiveness : success rate 100.00%.
    - Tested for definiteness : success rate 100.00%.
    - Tested for   symmetry   : success rate 100.00%.
  Finished for the kernel min_by_max_k...

- Testing the kernel gcd_k  K(x, y) = gcd(x, y) :
    It has domain = INTEGER.
  Results of the N = 1000 tests:
    - Tested for positiveness : success rate 100.00%.
    - Tested for definiteness : success rate 100.00%.
    - Tested for   symmetry   : success rate 100.00%.
  Finished for the kernel gcd_k...

- Testing the kernel lcm_k  K(x, y) = lcm(x, y) :
    It has domain = INTEGER.
  Results of the N = 1000 tests:
    - Tested for positiveness : success rate 58.80%.
    ===> I guess this kernel lcm_k  K(x, y) = lcm(x, y)  is NOT a positive definite kernel !
    - Tested for definiteness : success rate 99.95%.
    - Tested for   symmetry   : success rate 100.00%.
  Finished for the kernel lcm_k...

- Testing the kernel gcd_by_lcm_k  K(x, y) = gcd(x, y) / lcm(x, y) :
    It has domain = INTEGER.
  Results of the N = 1000 tests:
    - Tested for positiveness : success rate 100.00%.
    - Tested for definiteness : success rate 99.95%.
    - Tested for   symmetry   : success rate 100.00%.
  Finished for the kernel gcd_by_lcm_k...


Done testing some kernels...

- Apparently, the kernel log_k  K(x, y) = log(1 + x * y)  is NOT a positive definite kernel : because positivenesswas only observed with probability 96.50%.

- Apparently, the kernel cos_plus_k  K(x, y) = cos(x + y)  is NOT a positive definite kernel : because positivenesswas only observed with probability 46.60%.

- Apparently, the kernel max_k  K(x, y) = max(x, y)  is NOT a positive definite kernel : because positivenesswas only observed with probability 78.60%.

- Apparently, the kernel lcm_k  K(x, y) = lcm(x, y)  is NOT a positive definite kernel : because positivenesswas only observed with probability 58.80%.

- All the other kernel appeared to be positive definite. Now prove it.

Finished for this script 'test_kernel.py'.
#! /usr/bin/env python
# -*- coding: utf-8; mode: python -*-
"""
A Python 2/3 script to test kernels, for the Kernel Methods homework.

Source: http://lear.inrialpes.fr/people/mairal/teaching/2015-2016/MVA/
PDF: http://lear.inrialpes.fr/people/mairal/teaching/2015-2016/MVA/fichiers/homework_mva_2016.pdf

- *Date:* Saturday 06 February 2016.
- *Author:* Lilian Besson, for the MVA Master, (C) 2015-16.
- *Licence:* MIT Licence (http://lbesson.mit-license.org).
"""

from __future__ import print_function, division  # Python 2 compatibility if needed

import numpy as np

try:
    from sympy import lcm, gcd
except:
    # if True:
    print("Error: sympy is not available, switching back to fractions implementation of lcm/gcd")
    from fractions import gcd

    def lcm(x, y):
        """ lcm(x, y) : lowest common multiple, = x * y / gcd(x, y). """
        if gcd(x, y) == 0:
            return 0
        else:
            return (x * y) / gcd(x, y)


#: Number of random tests to do
N = 1000
#: Size of the sum_ij
T = 5
T = 50
#: Lower bound for random number in IR
LOWER_BOUND = -300.0
#: Upper bound for random number in IR
UPPER_BOUND = 300.0
#: Upper bound for random number in IN
UPPER_BOUND_INT = 1000
#: Small upper bound for random number in IN
SMALL_UPPER_BOUND_INT = 30
#: Tolerance for real values tests to zero
REAL_TOLERANCE = 1e-12
#: Tolerance for probabilities check to 100%
PROBA_TOLERANCE = 0.01

#: Standard deviation of the weights
WEIGHT_STD = 100
#: Distribution for weights
# WEIGHT_DISTRIBUTION = "NORMAL"
# WEIGHT_DISTRIBUTION = "UNIFORM_0_1"  # Useless!
WEIGHT_DISTRIBUTION = "UNIFORM_-1_1"

#: Map a domain to a random generator on this domain
random_generators = {
    "REAL":
        lambda: np.random.uniform(LOWER_BOUND, UPPER_BOUND),
    "REALPLUS":
        lambda: np.random.uniform(0.0, UPPER_BOUND),
    "INTEGER":
        lambda: np.random.randint(0, UPPER_BOUND_INT),
    "SMALLINT":
        lambda: np.random.randint(0, SMALL_UPPER_BOUND_INT),
    "MONEONE":
        lambda: np.random.uniform(-1.0, 1.0)
}


def test_kernel(K, N=N, T=T, verb=False):
    """
    Test if a function is a kernel on lots of random values (500 by default):

     - test if K(x, y) = K(y, x) (symmetry),
     - test if K(x, x) = 0 ==> x = 0 (definiteness),
     - test if sum_ij a_i a_j K(x_i, x_j) >= 0 for 'all' t, and a_i in IR, x_i in DOMAIN (positiveness) (for all i,j in range(t), and t drawn in range(T) randomly).

    Return numerical values, average of success rate on N = 500 tests.
    """
    # Get the domain of K and an appropriate random generator.
    domain = K.domain
    newvalue = random_generators[domain]
    # Random generator for weights
    def newweights(t):
        if WEIGHT_DISTRIBUTION == "UNIFORM_0_1":
            return np.random.rand(t) * WEIGHT_STD
        if WEIGHT_DISTRIBUTION == "UNIFORM_-1_1":
            return (2*np.random.rand(t) - 1) * (WEIGHT_STD / 2)
        if WEIGHT_DISTRIBUTION == "NORMAL":
            return WEIGHT_STD * np.random.randn(t)
    # Test for symmetry
    prop_sym = 0
    # Test for definiteness
    prop_def = 0
    # Test for positiveness
    prop_pos = 0
    for i in range(N):
        x = newvalue()
        y = newvalue()
        if verb:
            print("       - Random values x = {}, y = {}.".format(x, y))
        if K(x, y) == K(y, x):
            prop_sym += 1
        # Test : (A ==> B)  <=>  (B or not(A))
        if (not (abs(K(x, x)) < REAL_TOLERANCE)) or (abs(x) > REAL_TOLERANCE):
            prop_def += 1
        if (not (abs(K(y, y)) < REAL_TOLERANCE)) or (abs(y) > REAL_TOLERANCE):
            prop_def += 1
        # Generate lots of samples for checking positiveness
        t = np.random.randint(1, T)
        X = [newvalue() for i in range(t)]
        a = newweights(t)
        if verb:
            print("       - Random values t = {}, a = {}, X = {}.".format(t, a, X))
        if sum(a[i] * a[j] * K(X[i], X[j]) for i in range(t) for j in range(t)) >= 0:
            prop_pos += 1
    if verb:
        print("     - After {} random tests, sym = {:>3}, def = {:>3}, pos = {:>3}.".format(N, prop_sym, prop_def, prop_pos))
    # Return the results
    return {
        "symmetry":     (prop_sym / N),
        "definiteness": (prop_def / (2*N)),
        "positiveness": (prop_pos / N),
    }


# %% Tweak to define a new kernel


#: List of all kernels to test, dynamically created by @kernel decorator
all_kernels = []


def kernel(domain="REAL"):
    """
    Decorator for a new kernel (just an elegant way to add it to all_kernels).

    The parameter 'domain' can be:

     - "REAL"      for (-oo,+oo) = IR   real number,
     - "REALPLUS"  for [0,+oo)   = IR+  positive real number,
     - "INTEGER"   for [|0,+oo)  = IN   positive integer,
     - "SMALLINT"  for [|0,+oo)  = IN   (small) positive integer,
     - "MONEONE"   for (-1,+1)   = UR   the unit open disk of real values.
    """
    # global all_kernels
    def kernel_decorator(func):
        """
        Decorator for a new kernel, with a certain domain.
        """
        global all_kernels
        # Add a new kernel !
        all_kernels += [func]
        func.domain = domain
        return func
    return kernel_decorator


# %% List of Kernels


# Classical example of kernel: linear kernel!
@kernel("REAL")
def abs_k(x, y):
    """ K(x, y) = < x, y > = x * y """
    return x * y


@kernel("MONEONE")
def one_by_k(x, y):
    """ K(x, y) = 1 / (1 - x * y) """
    return 1 / (1 - x * y)


# Here we could have an overflow error, so using small integers
# From the slides
@kernel("SMALLINT")
def power_two_dot_k(x, y):
    """ K(x, y) = 2 ** (x * y) """
    return 2 ** (x * y)


# Here we could have an overflow error, so using small integers
@kernel("SMALLINT")
def power_two_sum_k(x, y):
    """ K(x, y) = 2 ** (x + y) """
    return 2 ** (x + y)


@kernel("REALPLUS")
def log_k(x, y):
    """ K(x, y) = log(1 + x * y) """
    return np.log(1 + x * y)


@kernel("REAL")
def exp_k(x, y):
    """ K(x, y) = exp(-(x-y)^2) """
    return np.exp(-(x-y)**2)


@kernel("REAL")
def cos_plus_k(x, y):
    """ K(x, y) = cos(x + y) """
    return np.cos(x + y)


@kernel("REAL")
def cos_minus_k(x, y):
    """ K(x, y) = cos(x - y) """
    return np.cos(x - y)


@kernel("REALPLUS")
def min_k(x, y):
    """ K(x, y) = min(x, y) """
    return min(x, y)


@kernel("REALPLUS")
def max_k(x, y):
    """ K(x, y) = max(x, y) """
    return max(x, y)


@kernel("REALPLUS")
def min_by_max_k(x, y):
    """ K(x, y) = min(x, y) / max(x, y) """
    if abs(max(x, y)) < REAL_TOLERANCE:
        return 0.0
    else:
        return min(x, y) / max(x, y)


@kernel("INTEGER")
def gcd_k(x, y):
    """ K(x, y) = gcd(x, y) """
    return gcd(x, y)


@kernel("INTEGER")
def lcm_k(x, y):
    """ K(x, y) = lcm(x, y) """
    return lcm(x, y)


@kernel("INTEGER")
def gcd_by_lcm_k(x, y):
    """ K(x, y) = gcd(x, y) / lcm(x, y) """
    if abs(lcm(x, y)) < REAL_TOLERANCE:
        return 0.0
    else:
        return gcd(x, y) / lcm(x, y)


# %% Testing all the kernels

def main(kernels_to_test=all_kernels, T=T, verb=False):
    """
    Test a lot of different kernels.
    """
    nb = len(all_kernels)
    print("\nStarting to test {} kernels...".format(nb))
    # Store results
    results = [1] * nb
    not_kernels = []
    # Start
    for i, K in enumerate(all_kernels):
        print("\n- Testing the kernel {} {}:".format(K.__name__, K.__doc__))
        print("    It has domain = {}.".format(K.domain))
        result = test_kernel(K, N=N, T=T, verb=verb)
        print("  Results of the N = {} tests:".format(N))
        # bad_test = None
        # min_rate = None
        for test, rate in result.items():
            print("    - Tested for {:^12} : success rate {:.2%}.".format(test, rate))
            if rate < 1.0 - PROBA_TOLERANCE:
                print("    ===> I guess this kernel {} {} is NOT a positive definite kernel !".format(K.__name__, K.__doc__))
                not_kernels.append({'K': K, 'test': test, 'rate': rate})
                # bad_test, min_rate = test, rate
        print("  Finished for the kernel {}...".format(K.__name__))
        results[i] = result
    print("\n\nDone testing some kernels...")
    return results, not_kernels


if __name__ == '__main__':
    print("\nScript to test if functions are kernels, by Lilian Besson (C) 2016")
    print("MIT Licensed (http://lbesson.mit-license.org)")
    print("\n\nUsing N = {}, T = {}.".format(N, T))
    print("For test domains:")
    print("  - IR  ~= [{}, {}],".format(LOWER_BOUND, UPPER_BOUND))
    print("  - IR+ ~= [ 0, {}],".format(UPPER_BOUND))
    print("  - IN  ~= [ 0, {}],".format(UPPER_BOUND_INT))
    print("  - IN  ~= [ 0, {}] (small to reduce overflow),".format(SMALL_UPPER_BOUND_INT))
    print("Tolerance for small real values = {}.".format(REAL_TOLERANCE))
    print("Tolerance for success rates     = {}.".format(PROBA_TOLERANCE))
    if WEIGHT_DISTRIBUTION == "UNIFORM_0_1":
        print("And sampling the a_i from uniform in [0, {}].".format(WEIGHT_STD))
    if WEIGHT_DISTRIBUTION == "UNIFORM_-1_1":
        print("And sampling the a_i from uniform in [-{}, {}].".format(WEIGHT_STD/2, WEIGHT_STD/2))
    if WEIGHT_DISTRIBUTION == "NORMAL":
        print("And sampling the a_i from a Gaussian of mean 0 and std {}.".format(WEIGHT_STD))
    # all_kernels = [absk]
    # results, not_kernels = main(kernels_to_test=all_kernels, verb=True)
    results, not_kernels = main()
    for item in not_kernels:
        K = item['K']
        test = item['test']
        rate = item['rate']
        print("\n- Apparently, the kernel {} {} is NOT a positive definite kernel : because {} was only observed with probability {:.2%}.".format(K.__name__, K.__doc__, test, rate))
    print("\n- All the other kernel appeared to be positive definite. Now prove it.")
    print("\nFinished for this script 'test_kernel.py'.")

# End of test_kernel.py

Comments (0)