# Lilian Besson (Naereen)Python 2/3 script to test kernels, for the Kernel Methods homework.

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146``` ```Script to test if functions are kernels, by Lilian Besson (C) 2016 MIT Licensed (http://lbesson.mit-license.org) Using N = 1000, T = 5. For test domains: - IR ~= [-300.0, 300.0], - IR+ ~= [ 0, 300.0], - IN ~= [ 0, 1000], - IN ~= [ 0, 30] (small to reduce overflow), Tolerance for small real values = 1e-12. Tolerance for success rates = 0.01. And sampling the a_i from uniform in [-50.0, 50.0]. Starting to test 14 kernels... - Testing the kernel abs_k K(x, y) = < x, y > = x * y : It has domain = REAL. Results of the N = 1000 tests: - Tested for positiveness : success rate 100.00%. - Tested for definiteness : success rate 100.00%. - Tested for symmetry : success rate 100.00%. Finished for the kernel abs_k... - Testing the kernel one_by_k K(x, y) = 1 / (1 - x * y) : It has domain = MONEONE. Results of the N = 1000 tests: - Tested for positiveness : success rate 100.00%. - Tested for definiteness : success rate 100.00%. - Tested for symmetry : success rate 100.00%. Finished for the kernel one_by_k... - Testing the kernel power_two_dot_k K(x, y) = 2 ** (x + y) : It has domain = SMALLINT. Results of the N = 1000 tests: - Tested for positiveness : success rate 100.00%. - Tested for definiteness : success rate 100.00%. - Tested for symmetry : success rate 100.00%. Finished for the kernel power_two_dot_k... - Testing the kernel power_two_sum_k K(x, y) = 2 ** (x * y) : It has domain = SMALLINT. Results of the N = 1000 tests: - Tested for positiveness : success rate 100.00%. - Tested for definiteness : success rate 100.00%. - Tested for symmetry : success rate 100.00%. Finished for the kernel power_two_sum_k... - Testing the kernel log_k K(x, y) = log(1 + x * y) : It has domain = REALPLUS. Results of the N = 1000 tests: - Tested for positiveness : success rate 96.50%. ===> I guess this kernel log_k K(x, y) = log(1 + x * y) is NOT a positive definite kernel ! - Tested for definiteness : success rate 100.00%. - Tested for symmetry : success rate 100.00%. Finished for the kernel log_k... - Testing the kernel exp_k K(x, y) = exp(-(x-y)^2) : It has domain = REAL. Results of the N = 1000 tests: - Tested for positiveness : success rate 100.00%. - Tested for definiteness : success rate 100.00%. - Tested for symmetry : success rate 100.00%. Finished for the kernel exp_k... - Testing the kernel cos_plus_k K(x, y) = cos(x + y) : It has domain = REAL. Results of the N = 1000 tests: - Tested for positiveness : success rate 46.60%. ===> I guess this kernel cos_plus_k K(x, y) = cos(x + y) is NOT a positive definite kernel ! - Tested for definiteness : success rate 100.00%. - Tested for symmetry : success rate 100.00%. Finished for the kernel cos_plus_k... - Testing the kernel cos_minus_k K(x, y) = cos(x - y) : It has domain = REAL. Results of the N = 1000 tests: - Tested for positiveness : success rate 100.00%. - Tested for definiteness : success rate 100.00%. - Tested for symmetry : success rate 100.00%. Finished for the kernel cos_minus_k... - Testing the kernel min_k K(x, y) = min(x, y) : It has domain = REALPLUS. Results of the N = 1000 tests: - Tested for positiveness : success rate 100.00%. - Tested for definiteness : success rate 100.00%. - Tested for symmetry : success rate 100.00%. Finished for the kernel min_k... - Testing the kernel max_k K(x, y) = max(x, y) : It has domain = REALPLUS. Results of the N = 1000 tests: - Tested for positiveness : success rate 78.60%. ===> I guess this kernel max_k K(x, y) = max(x, y) is NOT a positive definite kernel ! - Tested for definiteness : success rate 100.00%. - Tested for symmetry : success rate 100.00%. Finished for the kernel max_k... - Testing the kernel min_by_max_k K(x, y) = min(x, y) / max(x, y) : It has domain = REALPLUS. Results of the N = 1000 tests: - Tested for positiveness : success rate 100.00%. - Tested for definiteness : success rate 100.00%. - Tested for symmetry : success rate 100.00%. Finished for the kernel min_by_max_k... - Testing the kernel gcd_k K(x, y) = gcd(x, y) : It has domain = INTEGER. Results of the N = 1000 tests: - Tested for positiveness : success rate 100.00%. - Tested for definiteness : success rate 100.00%. - Tested for symmetry : success rate 100.00%. Finished for the kernel gcd_k... - Testing the kernel lcm_k K(x, y) = lcm(x, y) : It has domain = INTEGER. Results of the N = 1000 tests: - Tested for positiveness : success rate 58.80%. ===> I guess this kernel lcm_k K(x, y) = lcm(x, y) is NOT a positive definite kernel ! - Tested for definiteness : success rate 99.95%. - Tested for symmetry : success rate 100.00%. Finished for the kernel lcm_k... - Testing the kernel gcd_by_lcm_k K(x, y) = gcd(x, y) / lcm(x, y) : It has domain = INTEGER. Results of the N = 1000 tests: - Tested for positiveness : success rate 100.00%. - Tested for definiteness : success rate 99.95%. - Tested for symmetry : success rate 100.00%. Finished for the kernel gcd_by_lcm_k... Done testing some kernels... - Apparently, the kernel log_k K(x, y) = log(1 + x * y) is NOT a positive definite kernel : because positivenesswas only observed with probability 96.50%. - Apparently, the kernel cos_plus_k K(x, y) = cos(x + y) is NOT a positive definite kernel : because positivenesswas only observed with probability 46.60%. - Apparently, the kernel max_k K(x, y) = max(x, y) is NOT a positive definite kernel : because positivenesswas only observed with probability 78.60%. - Apparently, the kernel lcm_k K(x, y) = lcm(x, y) is NOT a positive definite kernel : because positivenesswas only observed with probability 58.80%. - All the other kernel appeared to be positive definite. Now prove it. Finished for this script 'test_kernel.py'. ```
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318``` ```#! /usr/bin/env python # -*- coding: utf-8; mode: python -*- """ A Python 2/3 script to test kernels, for the Kernel Methods homework. Source: http://lear.inrialpes.fr/people/mairal/teaching/2015-2016/MVA/ PDF: http://lear.inrialpes.fr/people/mairal/teaching/2015-2016/MVA/fichiers/homework_mva_2016.pdf - *Date:* Saturday 06 February 2016. - *Author:* Lilian Besson, for the MVA Master, (C) 2015-16. - *Licence:* MIT Licence (http://lbesson.mit-license.org). """ from __future__ import print_function, division # Python 2 compatibility if needed import numpy as np try: from sympy import lcm, gcd except: # if True: print("Error: sympy is not available, switching back to fractions implementation of lcm/gcd") from fractions import gcd def lcm(x, y): """ lcm(x, y) : lowest common multiple, = x * y / gcd(x, y). """ if gcd(x, y) == 0: return 0 else: return (x * y) / gcd(x, y) #: Number of random tests to do N = 1000 #: Size of the sum_ij T = 5 T = 50 #: Lower bound for random number in IR LOWER_BOUND = -300.0 #: Upper bound for random number in IR UPPER_BOUND = 300.0 #: Upper bound for random number in IN UPPER_BOUND_INT = 1000 #: Small upper bound for random number in IN SMALL_UPPER_BOUND_INT = 30 #: Tolerance for real values tests to zero REAL_TOLERANCE = 1e-12 #: Tolerance for probabilities check to 100% PROBA_TOLERANCE = 0.01 #: Standard deviation of the weights WEIGHT_STD = 100 #: Distribution for weights # WEIGHT_DISTRIBUTION = "NORMAL" # WEIGHT_DISTRIBUTION = "UNIFORM_0_1" # Useless! WEIGHT_DISTRIBUTION = "UNIFORM_-1_1" #: Map a domain to a random generator on this domain random_generators = { "REAL": lambda: np.random.uniform(LOWER_BOUND, UPPER_BOUND), "REALPLUS": lambda: np.random.uniform(0.0, UPPER_BOUND), "INTEGER": lambda: np.random.randint(0, UPPER_BOUND_INT), "SMALLINT": lambda: np.random.randint(0, SMALL_UPPER_BOUND_INT), "MONEONE": lambda: np.random.uniform(-1.0, 1.0) } def test_kernel(K, N=N, T=T, verb=False): """ Test if a function is a kernel on lots of random values (500 by default): - test if K(x, y) = K(y, x) (symmetry), - test if K(x, x) = 0 ==> x = 0 (definiteness), - test if sum_ij a_i a_j K(x_i, x_j) >= 0 for 'all' t, and a_i in IR, x_i in DOMAIN (positiveness) (for all i,j in range(t), and t drawn in range(T) randomly). Return numerical values, average of success rate on N = 500 tests. """ # Get the domain of K and an appropriate random generator. domain = K.domain newvalue = random_generators[domain] # Random generator for weights def newweights(t): if WEIGHT_DISTRIBUTION == "UNIFORM_0_1": return np.random.rand(t) * WEIGHT_STD if WEIGHT_DISTRIBUTION == "UNIFORM_-1_1": return (2*np.random.rand(t) - 1) * (WEIGHT_STD / 2) if WEIGHT_DISTRIBUTION == "NORMAL": return WEIGHT_STD * np.random.randn(t) # Test for symmetry prop_sym = 0 # Test for definiteness prop_def = 0 # Test for positiveness prop_pos = 0 for i in range(N): x = newvalue() y = newvalue() if verb: print(" - Random values x = {}, y = {}.".format(x, y)) if K(x, y) == K(y, x): prop_sym += 1 # Test : (A ==> B) <=> (B or not(A)) if (not (abs(K(x, x)) < REAL_TOLERANCE)) or (abs(x) > REAL_TOLERANCE): prop_def += 1 if (not (abs(K(y, y)) < REAL_TOLERANCE)) or (abs(y) > REAL_TOLERANCE): prop_def += 1 # Generate lots of samples for checking positiveness t = np.random.randint(1, T) X = [newvalue() for i in range(t)] a = newweights(t) if verb: print(" - Random values t = {}, a = {}, X = {}.".format(t, a, X)) if sum(a[i] * a[j] * K(X[i], X[j]) for i in range(t) for j in range(t)) >= 0: prop_pos += 1 if verb: print(" - After {} random tests, sym = {:>3}, def = {:>3}, pos = {:>3}.".format(N, prop_sym, prop_def, prop_pos)) # Return the results return { "symmetry": (prop_sym / N), "definiteness": (prop_def / (2*N)), "positiveness": (prop_pos / N), } # %% Tweak to define a new kernel #: List of all kernels to test, dynamically created by @kernel decorator all_kernels = [] def kernel(domain="REAL"): """ Decorator for a new kernel (just an elegant way to add it to all_kernels). The parameter 'domain' can be: - "REAL" for (-oo,+oo) = IR real number, - "REALPLUS" for [0,+oo) = IR+ positive real number, - "INTEGER" for [|0,+oo) = IN positive integer, - "SMALLINT" for [|0,+oo) = IN (small) positive integer, - "MONEONE" for (-1,+1) = UR the unit open disk of real values. """ # global all_kernels def kernel_decorator(func): """ Decorator for a new kernel, with a certain domain. """ global all_kernels # Add a new kernel ! all_kernels += [func] func.domain = domain return func return kernel_decorator # %% List of Kernels # Classical example of kernel: linear kernel! @kernel("REAL") def abs_k(x, y): """ K(x, y) = < x, y > = x * y """ return x * y @kernel("MONEONE") def one_by_k(x, y): """ K(x, y) = 1 / (1 - x * y) """ return 1 / (1 - x * y) # Here we could have an overflow error, so using small integers # From the slides @kernel("SMALLINT") def power_two_dot_k(x, y): """ K(x, y) = 2 ** (x * y) """ return 2 ** (x * y) # Here we could have an overflow error, so using small integers @kernel("SMALLINT") def power_two_sum_k(x, y): """ K(x, y) = 2 ** (x + y) """ return 2 ** (x + y) @kernel("REALPLUS") def log_k(x, y): """ K(x, y) = log(1 + x * y) """ return np.log(1 + x * y) @kernel("REAL") def exp_k(x, y): """ K(x, y) = exp(-(x-y)^2) """ return np.exp(-(x-y)**2) @kernel("REAL") def cos_plus_k(x, y): """ K(x, y) = cos(x + y) """ return np.cos(x + y) @kernel("REAL") def cos_minus_k(x, y): """ K(x, y) = cos(x - y) """ return np.cos(x - y) @kernel("REALPLUS") def min_k(x, y): """ K(x, y) = min(x, y) """ return min(x, y) @kernel("REALPLUS") def max_k(x, y): """ K(x, y) = max(x, y) """ return max(x, y) @kernel("REALPLUS") def min_by_max_k(x, y): """ K(x, y) = min(x, y) / max(x, y) """ if abs(max(x, y)) < REAL_TOLERANCE: return 0.0 else: return min(x, y) / max(x, y) @kernel("INTEGER") def gcd_k(x, y): """ K(x, y) = gcd(x, y) """ return gcd(x, y) @kernel("INTEGER") def lcm_k(x, y): """ K(x, y) = lcm(x, y) """ return lcm(x, y) @kernel("INTEGER") def gcd_by_lcm_k(x, y): """ K(x, y) = gcd(x, y) / lcm(x, y) """ if abs(lcm(x, y)) < REAL_TOLERANCE: return 0.0 else: return gcd(x, y) / lcm(x, y) # %% Testing all the kernels def main(kernels_to_test=all_kernels, T=T, verb=False): """ Test a lot of different kernels. """ nb = len(all_kernels) print("\nStarting to test {} kernels...".format(nb)) # Store results results =  * nb not_kernels = [] # Start for i, K in enumerate(all_kernels): print("\n- Testing the kernel {} {}:".format(K.__name__, K.__doc__)) print(" It has domain = {}.".format(K.domain)) result = test_kernel(K, N=N, T=T, verb=verb) print(" Results of the N = {} tests:".format(N)) # bad_test = None # min_rate = None for test, rate in result.items(): print(" - Tested for {:^12} : success rate {:.2%}.".format(test, rate)) if rate < 1.0 - PROBA_TOLERANCE: print(" ===> I guess this kernel {} {} is NOT a positive definite kernel !".format(K.__name__, K.__doc__)) not_kernels.append({'K': K, 'test': test, 'rate': rate}) # bad_test, min_rate = test, rate print(" Finished for the kernel {}...".format(K.__name__)) results[i] = result print("\n\nDone testing some kernels...") return results, not_kernels if __name__ == '__main__': print("\nScript to test if functions are kernels, by Lilian Besson (C) 2016") print("MIT Licensed (http://lbesson.mit-license.org)") print("\n\nUsing N = {}, T = {}.".format(N, T)) print("For test domains:") print(" - IR ~= [{}, {}],".format(LOWER_BOUND, UPPER_BOUND)) print(" - IR+ ~= [ 0, {}],".format(UPPER_BOUND)) print(" - IN ~= [ 0, {}],".format(UPPER_BOUND_INT)) print(" - IN ~= [ 0, {}] (small to reduce overflow),".format(SMALL_UPPER_BOUND_INT)) print("Tolerance for small real values = {}.".format(REAL_TOLERANCE)) print("Tolerance for success rates = {}.".format(PROBA_TOLERANCE)) if WEIGHT_DISTRIBUTION == "UNIFORM_0_1": print("And sampling the a_i from uniform in [0, {}].".format(WEIGHT_STD)) if WEIGHT_DISTRIBUTION == "UNIFORM_-1_1": print("And sampling the a_i from uniform in [-{}, {}].".format(WEIGHT_STD/2, WEIGHT_STD/2)) if WEIGHT_DISTRIBUTION == "NORMAL": print("And sampling the a_i from a Gaussian of mean 0 and std {}.".format(WEIGHT_STD)) # all_kernels = [absk] # results, not_kernels = main(kernels_to_test=all_kernels, verb=True) results, not_kernels = main() for item in not_kernels: K = item['K'] test = item['test'] rate = item['rate'] print("\n- Apparently, the kernel {} {} is NOT a positive definite kernel : because {} was only observed with probability {:.2%}.".format(K.__name__, K.__doc__, test, rate)) print("\n- All the other kernel appeared to be positive definite. Now prove it.") print("\nFinished for this script 'test_kernel.py'.") # End of test_kernel.py ```