Commits

Zoltan Szabo committed ec09b46

MMD estimation based on U- and V-statistics: added; see 'DMMD_Ustat_initialization.m', 'DMMD_Ustat_estimation.m', 'DMMD_Vstat_initialization.m', 'DMMD_Vstat_estimation.m'. Notes on HSIC, MMD and measures of concordance: added (doc).

Comments (0)

Files changed (8)

-'MMDonline' renamed to 'MMD_online'; see 'DMMD_online_initialization.m', 'DMMD_online_estimation.m'; 'IMMD_DMMD_initialization.m': modified accordingly.
+v0.24 (Dec 12, 2012):
+-MMD estimation based on U- and V-statistics: added; see 'DMMD_Ustat_initialization.m', 'DMMD_Ustat_estimation.m', 'DMMD_Vstat_initialization.m', 'DMMD_Vstat_estimation.m'.
+-Notes on HSIC, MMD and measures of concordance: added (doc).
+-Online MMD estimation: now RBF and linear kernels are both available; see 'DMMD_online_initialization.m', 'DMMD_online_estimation.m'.
+-'MMDonline' renamed to 'MMD_online'; see 'DMMD_online_initialization.m', 'DMMD_online_estimation.m'; 'IMMD_DMMD_initialization.m': modified accordingly.
 
 v0.23 (Dec 07, 2012):
 -Three multivariate extensions of Spearman's rho: added; see 'ASpearman1_initialization.m', 'ASpearman1_estimation.m', 'ASpearman2_initialization.m', 'ASpearman2_estimation.m', 'ASpearman3_initialization.m', 'ASpearman3_estimation.m'.
 
 - `entropy (H)`: Shannon entropy, R�nyi entropy, Tsallis entropy (Havrda and Charv�t entropy), complex entropy,
 - `mutual information (I)`: generalized variance, kernel canonical correlation analysis, kernel generalized variance, Hilbert-Schmidt independence criterion, Shannon mutual information, L2 mutual information, R�nyi mutual information, Tsallis mutual information, copula-based kernel dependency, multivariate version of Hoeffding's Phi, Schweizer-Wolff's sigma and kappa, complex mutual information, Cauchy-Schwartz quadratic mutual information, Euclidean distance based quadratic mutual information,
-- `divergence (D)`: Kullback-Leibler divergence (relative entropy), L2 divergence, R�nyi divergence, Tsallis divergence, Hellinger distance, Bhattacharyya distance, maximum mean discrepancy (kernel distance), J-distance (symmetrised Kullback-Leibler divergence), Cauchy-Schwartz divergence, Euclidean distance based divergence,
-- `association measures (A)`: multivariate extensions of Spearman's rho (Spearman's rank correlation coefficient),
+- `divergence (D)`: Kullback-Leibler divergence (relative entropy), L2 divergence, R�nyi divergence, Tsallis divergence, Hellinger distance, Bhattacharyya distance, maximum mean discrepancy (kernel distance, an integral probability metric), J-distance (symmetrised Kullback-Leibler divergence), Cauchy-Schwartz divergence, Euclidean distance based divergence,
+- `association measures (A)`, including `measures of concordance`: multivariate extensions of Spearman's rho (Spearman's rank correlation coefficient, grade correlation coefficient),
 - `cross quantities (C)`: cross-entropy.
 
 ITE offers solution methods for 
 
 **Download** the latest release: 
 
-- code: [zip](https://bitbucket.org/szzoli/ite/downloads/ITE-0.23_code.zip), [tar.bz2](https://bitbucket.org/szzoli/ite/downloads/ITE-0.23_code.tar.bz2), 
-- [documentation (pdf)](https://bitbucket.org/szzoli/ite/downloads/ITE-0.23_documentation.pdf).
+- code: [zip](https://bitbucket.org/szzoli/ite/downloads/ITE-0.24_code.zip), [tar.bz2](https://bitbucket.org/szzoli/ite/downloads/ITE-0.24_code.tar.bz2), 
+- [documentation (pdf)](https://bitbucket.org/szzoli/ite/downloads/ITE-0.24_documentation.pdf).
 
 

code/H_I_D_A_C/base_estimators/DMMD_Ustat_estimation.m

+function [D] = DMMD_Ustat_estimation(Y1,Y2,co)
+%Estimates divergence (D) of Y1 and Y2 using the MMD (maximum mean discrepancy) method, applying V-statistics. 
+%
+%We use the naming convention 'D<name>_estimation' to ease embedding new divergence estimation methods.
+%
+%INPUT:
+%  Y1: Y1(:,t) is the t^th sample from the first distribution.
+%  Y2: Y2(:,t) is the t^th sample from the second distribution. Note: the number of samples in Y1 [=size(Y1,2)] and Y2 [=size(Y2,2)] can be different.
+%  co: divergence estimator object.
+%
+%REFERENCE: 
+%   Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Scholkopf and Alexander Smola. A Kernel Two-Sample Test. Journal of Machine  Learning Research 13 (2012) 723-773.
+%
+%Copyright (C) 2012 Zoltan Szabo ("http://nipg.inf.elte.hu/szzoli", "szzoli (at) cs (dot) elte (dot) hu")
+%
+%This file is part of the ITE (Information Theoretical Estimators) Matlab/Octave toolbox.
+%
+%ITE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by
+%the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
+%
+%This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of
+%MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for more details.
+%
+%You should have received a copy of the GNU General Public License along with ITE. If not, see <http://www.gnu.org/licenses/>.
+
+%co.mult:OK.
+
+[dY1,num_of_samplesY1] = size(Y1);
+[dY2,num_of_samplesY2] = size(Y2);
+
+%verification:
+    if dY1~=dY2
+        error('The dimension of the samples in Y1 and Y2 must be equal.');
+    end
+    
+%kY1Y1,kY2Y2,kY1Y2:    
+    switch co.kernel
+        case 'RBF' 
+            %pairwise distances:
+                kY1Y1 = sqdistance(Y1);
+                kY2Y2 = sqdistance(Y2);
+                kY1Y2 = sqdistance(Y1,Y2);
+            %distance(i,j) ->  kernel(i,j):
+                kY1Y1 = exp(-kY1Y1/(2*co.sigma^2));
+                kY2Y2 = exp(-kY2Y2/(2*co.sigma^2));
+                kY1Y2 = exp(-kY1Y2/(2*co.sigma^2));
+        case 'linear'
+            kY1Y1 = Y1.' * Y1;
+            kY2Y2 = Y2.' * Y2;
+            kY1Y2 = Y1.' * Y2;
+        otherwise
+            error('Kernel=?');
+    end
+
+term1 = sum(sum(kY1Y1)) / (num_of_samplesY1^2);
+term2 = sum(sum(kY2Y2)) / (num_of_samplesY2^2);
+term3 = -2 * sum(sum(kY1Y2)) / (num_of_samplesY1*num_of_samplesY2);
+
+D = sqrt(abs(term1+term2+term3)); %abs(): to avoid 'sqrt(negative)' values

code/H_I_D_A_C/base_estimators/DMMD_Ustat_initialization.m

+function [co] = DMMD_Ustat_initialization(mult)
+%Initialization of the MMD (maximum mean discrepancy) divergence estimator, applying U-statistics.
+%
+%Note:
+%   1)The estimator is treated as a cost object (co).
+%   2)We use the naming convention 'D<name>_initialization' to ease embedding new divergence estimation methods.
+%
+%INPUT:
+%   mult: is a multiplicative constant relevant (needed) in the estimation; '=1' means yes, '=0' no.
+%OUTPUT:
+%   co: cost object (structure).
+%
+%Copyright (C) 2012 Zoltan Szabo ("http://nipg.inf.elte.hu/szzoli", "szzoli (at) cs (dot) elte (dot) hu")
+%
+%This file is part of the ITE (Information Theoretical Estimators) Matlab/Octave toolbox.
+%
+%ITE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by
+%the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
+%
+%This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of
+%MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for more details.
+%
+%You should have received a copy of the GNU General Public License along with ITE. If not, see <http://www.gnu.org/licenses/>.
+
+%mandatory fields:
+    co.name = 'MMD_Ustat';
+    co.mult = mult;
+    
+%other fields:   
+    %Kernel choice; possibilities: 'linear', 'RBF' (Gaussian):
+        %I (RBF):
+            co.kernel = 'RBF';
+            co.sigma = 0.01; %std in the RBF kernel
+        %II (linear):
+            %co.kernel = 'linear';

code/H_I_D_A_C/base_estimators/DMMD_Vstat_estimation.m

+function [D] = DMMD_Vstat_estimation(Y1,Y2,co)
+%Estimates divergence (D) of Y1 and Y2 using the MMD (maximum mean discrepancy) method, applying V-statistics. 
+%
+%We use the naming convention 'D<name>_estimation' to ease embedding new divergence estimation methods.
+%
+%INPUT:
+%  Y1: Y1(:,t) is the t^th sample from the first distribution.
+%  Y2: Y2(:,t) is the t^th sample from the second distribution. Note: the number of samples in Y1 [=size(Y1,2)] and Y2 [=size(Y2,2)] can be different.
+%  co: divergence estimator object.
+%
+%REFERENCE: 
+%   Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Scholkopf and Alexander Smola. A Kernel Two-Sample Test. Journal of Machine  Learning Research 13 (2012) 723-773.
+%
+%Copyright (C) 2012 Zoltan Szabo ("http://nipg.inf.elte.hu/szzoli", "szzoli (at) cs (dot) elte (dot) hu")
+%
+%This file is part of the ITE (Information Theoretical Estimators) Matlab/Octave toolbox.
+%
+%ITE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by
+%the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
+%
+%This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of
+%MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for more details.
+%
+%You should have received a copy of the GNU General Public License along with ITE. If not, see <http://www.gnu.org/licenses/>.
+
+%co.mult:OK.
+
+[dY1,num_of_samplesY1] = size(Y1);
+[dY2,num_of_samplesY2] = size(Y2);
+
+%verification:
+    if dY1~=dY2
+        error('The dimension of the samples in Y1 and Y2 must be equal.');
+    end
+    
+%kY1Y1,kY2Y2,kY1Y2:    
+    switch co.kernel
+        case 'RBF'   
+            %pairwise distances:
+                kY1Y1 = sqdistance(Y1);
+                kY2Y2 = sqdistance(Y2);
+                kY1Y2 = sqdistance(Y1,Y2);
+            %distance(i,j) ->  kernel(i,j):
+                kY1Y1 = exp(-kY1Y1/(2*co.sigma^2));
+                kY2Y2 = exp(-kY2Y2/(2*co.sigma^2));
+                kY1Y2 = exp(-kY1Y2/(2*co.sigma^2));
+        case 'linear'
+            kY1Y1 = Y1.' * Y1;
+            kY2Y2 = Y2.' * Y2;
+            kY1Y2 = Y1.' * Y2;
+        otherwise
+            error('Kernel=?');
+    end
+        
+%make the diagonal zero in kY1Y1 and kY2Y2:
+    d = dY1;
+    idx_diag = [1:d] + [0:d-1]*d;
+    kY1Y1(idx_diag) = 0;
+    kY2Y2(idx_diag) = 0;
+
+term1 = sum(sum(kY1Y1)) / (num_of_samplesY1*(num_of_samplesY1-1));
+term2 = sum(sum(kY2Y2)) / (num_of_samplesY2*(num_of_samplesY2-1));
+term3 = -2 * sum(sum(kY1Y2)) / (num_of_samplesY1*num_of_samplesY2);
+
+D = sqrt(abs(term1+term2+term3)); %abs(): to avoid 'sqrt(negative)' values

code/H_I_D_A_C/base_estimators/DMMD_Vstat_initialization.m

+function [co] = DMMD_Vstat_initialization(mult)
+%Initialization of the MMD (maximum mean discrepancy) divergence estimator, applying V-statistics.
+%
+%Note:
+%   1)The estimator is treated as a cost object (co).
+%   2)We use the naming convention 'D<name>_initialization' to ease embedding new divergence estimation methods.
+%
+%INPUT:
+%   mult: is a multiplicative constant relevant (needed) in the estimation; '=1' means yes, '=0' no.
+%OUTPUT:
+%   co: cost object (structure).
+%
+%Copyright (C) 2012 Zoltan Szabo ("http://nipg.inf.elte.hu/szzoli", "szzoli (at) cs (dot) elte (dot) hu")
+%
+%This file is part of the ITE (Information Theoretical Estimators) Matlab/Octave toolbox.
+%
+%ITE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by
+%the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
+%
+%This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of
+%MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for more details.
+%
+%You should have received a copy of the GNU General Public License along with ITE. If not, see <http://www.gnu.org/licenses/>.
+
+%mandatory fields:
+    co.name = 'MMD_Vstat';
+    co.mult = mult;
+    
+%other fields:
+    %Kernel choice; possibilities: 'linear', 'RBF' (Gaussian):
+        %I (RBF):
+            co.kernel = 'RBF';
+            co.sigma = 0.01; %std in the RBF kernel
+        %II (linear):
+            %co.kernel = 'linear';

code/H_I_D_A_C/base_estimators/DMMD_online_estimation.m

     Y1j = Y1(:,even_indices);
     Y2i = Y2(:,odd_indices);
     Y2j = Y2(:,even_indices);
-
-D = (K(Y1i,Y1j,co) + K(Y2i,Y2j,co) - K(Y1i,Y2j,co) - K(Y1j,Y2i,co)) / (num_of_samples/2);
+    
+switch co.kernel
+    case 'RBF'
+        D = (K_RBF(Y1i,Y1j,co) + K_RBF(Y2i,Y2j,co) - K_RBF(Y1i,Y2j,co) - K_RBF(Y1j,Y2i,co)) / (num_of_samples/2);
+    case 'linear'
+        D = (K_linear(Y1i,Y1j) + K_linear(Y2i,Y2j) - K_linear(Y1i,Y2j) - K_linear(Y1j,Y2i)) / (num_of_samples/2);
+    otherwise
+        error('Kernel=?');
+end
 
 %-----------------------------
-function [s] = K(U,V,co)
+function [s] = K_RBF(U,V,co)
 %Computes \sum_i kernel(U(:,i),V(:,i)), RBF (Gaussian) kernel is used with std=co.sigma
 
 s = sum( exp(-sum((U-V).^2,1)/(2*co.sigma^2)) );
+
+%--------
+function [s] = K_linear(U,V)
+%Computes \sum_i kernel(U(:,i),V(:,i)) in case of a linear kernel
+
+s = sum(dot(U,V));

code/H_I_D_A_C/base_estimators/DMMD_online_initialization.m

     co.name = 'MMD_online';
     co.mult = mult;
     
-%other fields:    
-    co.sigma = 0.01;%std in the RBF (Gaussian) kernel
+%other fields: 
+    %Kernel choice; possibilities: 'linear', 'RBF' (Gaussian):
+        %I (RBF):
+            co.kernel = 'RBF';
+            co.sigma = 0.01; %std in the RBF kernel
+        %II (linear):
+            %co.kernel = 'linear';
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.