Commits

Zoltan Szabo committed 3b72fd2

Hellinger-, Bhattacharyya distance; volume of the unit ball: added. DL2_kNN_k_estimation.m: a '/'->'*' typo corrected.

Comments (0)

Files changed (13)

+v0.15 (Oct 29, 2012):
+-The Hellinger and Bhattacharyya distances are now available in ITE. They can be estimated via k-nearest neighbor methods; see 'DHellinger_kNN_k_initialization.m', 'DHellinger_kNN_k_estimation.m', 'DBhattacharyya_kNN_k_initialization.m', and 'DBhattacharyya_kNN_k_estimation.m'.
+-volume_of_the_unit_ball.m: added. 
+-DL2_kNN_k_estimation.m: a '/'->'*' typo corrected.
+
 v0.14 (Oct 29, 2012):
 -Monte-Carlo simulation to compute the additive constants in Renyi entropy estimation: added; see 'estimate_HRenyi_constant.m'.
 -compute_length_HRenyi_MST.m: pdist -> sqdistance (acceleration).
 - multi-platform (tested extensively on Windows and Linux),
 - free and open source (released under the GNU GPLv3(>=) license).
 
-ITE can estimate Shannon-, R�nyi-, Tsallis entropy; generalized variance, kernel canonical correlation analysis, kernel generalized variance, Hilbert-Schmidt independence criterion, Shannon-, L2-, R�nyi-, Tsallis mutual information, copula-based kernel dependency, multivariate version of Hoeffding's Phi, Schweizer-Wolff's sigma and kappa; complex variants of entropy and mutual information; L2-, R�nyi-, Tsallis divergence, maximum mean discrepancy, and J-distance.
+ITE can estimate Shannon-, R�nyi-, Tsallis entropy; generalized variance, kernel canonical correlation analysis, kernel generalized variance, Hilbert-Schmidt independence criterion, Shannon-, L2-, R�nyi-, Tsallis mutual information, copula-based kernel dependency, multivariate version of Hoeffding's Phi, Schweizer-Wolff's sigma and kappa; complex variants of entropy and mutual information; L2-, R�nyi-, Tsallis divergence; Hellinger-, Bhattacharyya distance; maximum mean discrepancy, and J-distance.
 
 ITE offers solution methods for 
 

code/H_I_D/base_estimators/DBhattacharyya_kNN_k_estimation.m

+function [D] = DBhattacharyya_kNN_k_estimation(X,Y,co)
+%Estimates the Bhattacharyya distance of X and Y (X(:,t), Y(:,t) is the t^th sample)
+%using the kNN method (S={k}). The number of samples in X [=size(X,2)] and Y [=size(Y,2)] can be different. Cost parameters are provided in the cost object co.
+%
+%We make use of the naming convention 'D<name>_estimation', to ease embedding new divergence estimation methods.
+%
+%REFERENCE: 
+%	Barnabas Poczos and Liang Xiong and Dougal Sutherland and Jeff Schneider. Support Distribution Machines. Technical Report, 2012. "http://arxiv.org/abs/1202.0302"
+%
+%Copyright (C) 2012 Zoltan Szabo ("http://nipg.inf.elte.hu/szzoli", "szzoli (at) cs (dot) elte (dot) hu")
+%
+%This file is part of the ITE (Information Theoretical Estimators) Matlab/Octave toolbox.
+%
+%ITE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by
+%the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
+%
+%This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of
+%MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for more details.
+%
+%You should have received a copy of the GNU General Public License along with ITE. If not, see <http://www.gnu.org/licenses/>.
+
+%co.mult:OK.
+
+if size(X,1)~=size(Y,1)
+    disp('Error: the dimension of X and Y must be equal.');
+end
+
+%D_ab (Bhattacharyya coefficient):
+	if co.p %[p(x)dx]
+		D_ab= estimate_Dab(X,Y,co);
+	else %[q(x)dx]
+		D_ab = estimate_Dab(Y,X,co);
+	end
+
+%D = -log(D_ab);%theoretically
+D = -log(abs(D_ab));%abs() to avoid possible 'log(negative)' values due to the finite number of samples
+
+

code/H_I_D/base_estimators/DBhattacharyya_kNN_k_initialization.m

+function [co] = DBhattacharyya_kNN_k_initialization(mult)
+%Initialization of the kNN (k-nearest neighbor, S={k}) based Bhattacharyya distance estimator.
+%
+%Note:
+%   1)The estimator is treated as a cost object (co).
+%   2)We make use of the naming convention 'D<name>_initialization', to ease embedding new divergence estimation methods.
+%
+%INPUT:
+%   mult: is a multiplicative constant relevant (needed) in the estimation; '=1' means yes, '=0' no.
+%OUTPUT:
+%   co: cost object (structure).
+%
+%Copyright (C) 2012 Zoltan Szabo ("http://nipg.inf.elte.hu/szzoli", "szzoli (at) cs (dot) elte (dot) hu")
+%
+%This file is part of the ITE (Information Theoretical Estimators) Matlab/Octave toolbox.
+%
+%ITE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by
+%the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
+%
+%This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of
+%MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for more details.
+%
+%You should have received a copy of the GNU General Public License along with ITE. If not, see <http://www.gnu.org/licenses/>.
+
+%mandatory fields:
+    co.name = 'Bhattacharyya_kNN_k';
+    co.mult = mult;
+    
+%other fields:
+    %Possibilities for 'co.kNNmethod' (see 'kNN_squared_distances.m'): 
+        %I: 'knnFP1': fast pairwise distance computation and C++ partial sort; parameter: co.k.        
+        %II: 'knnFP2': fast pairwise distance computation; parameter: co.k.
+        %III: 'knnsearch' (Matlab Statistics Toolbox): parameters: co.k, co.NSmethod ('kdtree' or 'exhaustive').
+        %IV: 'ANN' (approximate nearest neighbor); parameters: co.k, co.epsi. 
+		%I:
+            co.kNNmethod = 'knnFP1';
+            co.k = 3;%k-nearest neighbors
+  	    %II:
+            %co.kNNmethod = 'knnFP2';
+            %co.k = 3;%k-nearest neighbors
+        %III:
+            %co.kNNmethod = 'knnsearch';
+            %co.k = 3;%k-nearest neighbors
+            %co.NSmethod = 'kdtree';
+        %IV:
+            %co.kNNmethod = 'ANN';
+            %co.k = 3;%k-nearest neighbors
+            %co.epsi = 0; %=0: exact kNN; >0: approximate kNN, the true (not squared) distances can not exceed the real distance more than a factor of (1+epsi).
+	%Possibilities for rewriting the Bhattacharyya distance:
+		%I [\int p^{1/2}(x)q^{1/2}(x)dx = \int p^{-1/2}(x)q^{1/2}(x) p(x)dx]:
+			co.p = 1; %use p [p(x)dx]
+		%II [\int p^{1/2}(x)q^{1/2}(x)dx = \int q^{-1/2}(x)p^{1/2}(x) q(x)dx]:
+			%co.p = 0; %use q instead [q(x)dx]
+	%Fixed, do not change it:
+		co.a = -1/2; 
+		co.b = 1/2;
+				
+%initialize the ann wrapper in Octave, if needed:
+    initialize_Octave_ann_wrapper_if_needed(co.kNNmethod);
+    

code/H_I_D/base_estimators/DHellinger_kNN_k_estimation.m

+function [D] = DHellinger_kNN_k_estimation(X,Y,co)
+%Estimates the Hellinger distance of X and Y (X(:,t), Y(:,t) is the t^th sample)
+%using the kNN method (S={k}). The number of samples in X [=size(X,2)] and Y [=size(Y,2)] can be different. Cost parameters are provided in the cost object co.
+%
+%We make use of the naming convention 'D<name>_estimation', to ease embedding new divergence estimation methods.
+%
+%REFERENCE: 
+%	Barnabas Poczos and Liang Xiong and Dougal Sutherland and Jeff Schneider. Support Distribution Machines. Technical Report, 2012. "http://arxiv.org/abs/1202.0302"
+%
+%Copyright (C) 2012 Zoltan Szabo ("http://nipg.inf.elte.hu/szzoli", "szzoli (at) cs (dot) elte (dot) hu")
+%
+%This file is part of the ITE (Information Theoretical Estimators) Matlab/Octave toolbox.
+%
+%ITE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by
+%the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
+%
+%This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of
+%MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for more details.
+%
+%You should have received a copy of the GNU General Public License along with ITE. If not, see <http://www.gnu.org/licenses/>.
+
+%co.mult:OK.
+
+if size(X,1)~=size(Y,1)
+    disp('Error: the dimension of X and Y must be equal.');
+end
+
+%D_ab (Bhattacharyya coefficient):
+	if co.p %[p(x)dx]
+		D_ab = estimate_Dab(X,Y,co);
+	else %[q(x)dx]
+		D_ab = estimate_Dab(Y,X,co);
+	end
+
+%D = sqrt(1-D_ab);%theoretically
+D = sqrt(abs(1-D_ab));%abs() to avoid possible 'sqrt(negative)' values due to the finite number of samples

code/H_I_D/base_estimators/DHellinger_kNN_k_initialization.m

+function [co] = DHellinger_kNN_k_initialization(mult)
+%Initialization of the kNN (k-nearest neighbor, S={k}) based Hellinger distance estimator.
+%
+%Note:
+%   1)The estimator is treated as a cost object (co).
+%   2)We make use of the naming convention 'D<name>_initialization', to ease embedding new divergence estimation methods.
+%
+%INPUT:
+%   mult: is a multiplicative constant relevant (needed) in the estimation; '=1' means yes, '=0' no.
+%OUTPUT:
+%   co: cost object (structure).
+%
+%Copyright (C) 2012 Zoltan Szabo ("http://nipg.inf.elte.hu/szzoli", "szzoli (at) cs (dot) elte (dot) hu")
+%
+%This file is part of the ITE (Information Theoretical Estimators) Matlab/Octave toolbox.
+%
+%ITE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by
+%the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
+%
+%This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of
+%MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for more details.
+%
+%You should have received a copy of the GNU General Public License along with ITE. If not, see <http://www.gnu.org/licenses/>.
+
+%mandatory fields:
+    co.name = 'Hellinger_kNN_k';
+    co.mult = mult;
+    
+%other fields:
+    %Possibilities for 'co.kNNmethod' (see 'kNN_squared_distances.m'): 
+        %I: 'knnFP1': fast pairwise distance computation and C++ partial sort; parameter: co.k.        
+        %II: 'knnFP2': fast pairwise distance computation; parameter: co.k.
+        %III: 'knnsearch' (Matlab Statistics Toolbox): parameters: co.k, co.NSmethod ('kdtree' or 'exhaustive').
+        %IV: 'ANN' (approximate nearest neighbor); parameters: co.k, co.epsi. 
+		%I:
+            co.kNNmethod = 'knnFP1';
+            co.k = 3;%k-nearest neighbors
+  	    %II:
+            %co.kNNmethod = 'knnFP2';
+            %co.k = 3;%k-nearest neighbors
+        %III:
+            %co.kNNmethod = 'knnsearch';
+            %co.k = 3;%k-nearest neighbors
+            %co.NSmethod = 'kdtree';
+        %IV:
+            %co.kNNmethod = 'ANN';
+            %co.k = 3;%k-nearest neighbors
+            %co.epsi = 0; %=0: exact kNN; >0: approximate kNN, the true (not squared) distances can not exceed the real distance more than a factor of (1+epsi).
+	%Possibilities for rewriting the Hellinger distance:
+		%I [\int p^{1/2}(x)q^{1/2}(x)dx = \int p^{-1/2}(x)q^{1/2}(x) p(x)dx]:
+			co.p = 1; %use p [p(x)dx]
+		%II [\int p^{1/2}(x)q^{1/2}(x)dx = \int q^{-1/2}(x)p^{1/2}(x) q(x)dx]:
+			%co.p = 0; %use q instead [q(x)dx]
+	%Fixed, do not change it:
+		co.a = -1/2; 
+		co.b = 1/2;
+				
+%initialize the ann wrapper in Octave, if needed:
+    initialize_Octave_ann_wrapper_if_needed(co.kNNmethod);
+    

code/H_I_D/base_estimators/DL2_kNN_k_estimation.m

     d = dX;
 end
 
+c = volume_of_the_unit_ball(d);%volume of the d-dimensional unit ball
+
 squared_distancesXX = kNN_squared_distances(X,X,co,1);
 squared_distancesYX = kNN_squared_distances(Y,X,co,0);
 dist_k_XX = sqrt(squared_distancesXX(end,:));
 dist_k_YX = sqrt(squared_distancesYX(end,:));
 
-c = pi^(d/2) * gamma(d/2+1);
-
-term1 = (co.k-1) / ((num_of_samplesX-1)*c) ./ dist_k_XX.^d;
-term2 = 2 * (co.k-1) / (num_of_samplesY*c) ./ dist_k_YX.^d;
-term3 = (num_of_samplesX - 1) * c  * (co.k-2) * (co.k-1) / (co.k * (num_of_samplesY*c)^2) * (dist_k_XX.^d ./ dist_k_YX.^(2*d));
-L2 = mean(term1-term2+term3);
+term1 = mean(dist_k_XX.^(-d)) * (co.k-1) / ((num_of_samplesX-1)*c);
+term2 = mean(dist_k_YX.^(-d)) * 2 * (co.k-1) / (num_of_samplesY*c);
+term3 = mean((dist_k_XX.^d) ./ (dist_k_YX.^(2*d))) *  (num_of_samplesX - 1) * (co.k-2) * (co.k-1) / (num_of_samplesY^2*c*co.k);
+L2 = term1-term2+term3;
 %D = sqrt(L2);%theoretically
 D = sqrt(abs(L2));%due to the finite number of samples L2 can be negative. In this case: sqrt(L2) is complex; to avoid such values we take abs().

code/H_I_D/base_estimators/HRenyi_weightedkNN_estimation.m

         load(FN,'wo');
     end
 
-cdunitball = (pi^(d/2))/gamma(d/2+1); %Volume of unit ball in d dimensions
+cdunitball = volume_of_the_unit_ball(d); %Volume of unit ball in d dimensions
 kvec = k1:k2; 
 N = floor(T/2);
 M = T - N;

code/H_I_D/base_estimators/HShannon_kNN_k_estimation.m

 %We make use of the naming convention 'H<name>_estimation', to ease embedding new entropy estimation methods.
 %
 %REFERENCE:
-%   M. N. Goria, Nikolai N. Leonenko, V. V. Mergel, and P. L. Novi Inverardi. A new class of random vector entropy estimators and its applications in testing statistical hypotheses. Journal of Nonparametric Statistics, 17: 277�297, 2005. (S={k})
+%   M. N. Goria, Nikolai N. Leonenko, V. V. Mergel, and P. L. Novi Inverardi. A new class of random vector entropy estimators and its applications in testing statistical hypotheses. Journal of Nonparametric Statistics, 17: 277�297, 2005. (S={k})
 %   Harshinder Singh, Neeraj Misra, Vladimir Hnizdo, Adam Fedorowicz and Eugene Demchuk. Nearest neighbor estimates of entropy. American Journal of Mathematical and Management Sciences, 23, 301-321, 2003. (S={k})
-%   L. F. Kozachenko and Nikolai N. Leonenko. A statistical estimate for the entropy of a random vector. Problems of Information Transmission, 23:9�16, 1987. (S={1})
+%   L. F. Kozachenko and Nikolai N. Leonenko. A statistical estimate for the entropy of a random vector. Problems of Information Transmission, 23:9�16, 1987. (S={1})
 %
 %Copyright (C) 2012 Zoltan Szabo ("http://nipg.inf.elte.hu/szzoli", "szzoli (at) cs (dot) elte (dot) hu")
 %
 %co.mult:OK.
 
 %H estimation:
-    V = pi^(d/2) / gamma(d/2+1); %= 2 * pi^(d/2) / ( d*gamma(d/2) );
+    V = volume_of_the_unit_ball(d); 
     H = log(num_of_samples-1) - psi(co.k) + log(V) + d / num_of_samples * sum(log(sqrt(squared_distances(co.k,:)))); %sqrt <= squared_distances,
  

code/H_I_D/utilities/estimate_Dab.m

+function [D_ab] = estimate_Dab(X,Y,co)
+%Estimates D_ab = \int p^a(x)q^b(x)dx; the Hellinger distance and the Bhattacharyya distance are simple functions of this quantity.
+%
+%INPUT:
+%   X: X(:,t) is the t^th sample from the first distribution.
+%   Y: Y(:,t) is the t^th sample from the second distribution.
+%  co: cost object (structure).
+%
+%Copyright (C) 2012 Zoltan Szabo ("http://nipg.inf.elte.hu/szzoli", "szzoli (at) cs (dot) elte (dot) hu")
+%
+%This file is part of the ITE (Information Theoretical Estimators) Matlab/Octave toolbox.
+%
+%ITE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by
+%the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
+%
+%This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of
+%MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for more details.
+%
+%You should have received a copy of the GNU General Public License along with ITE. If not, see <http://www.gnu.org/licenses/>.
+
+%initialization:
+    [d,num_of_samplesY] = size(Y);
+    [d,num_of_samplesX] = size(X);
+    a = co.a;
+    b = co.b;
+    k = co.k;
+
+squared_distancesXX = kNN_squared_distances(X,X,co,1);
+squared_distancesYX = kNN_squared_distances(Y,X,co,0);
+dist_k_XX = sqrt(squared_distancesXX(end,:));
+dist_k_YX = sqrt(squared_distancesYX(end,:));
+
+c = volume_of_the_unit_ball(d);
+B = c^(-(a+b)) * gamma(k)^2 / (gamma(k-a)*gamma(k-b));
+D_ab = (num_of_samplesX-1)^(-a) * num_of_samplesY^(-b) * B * mean(dist_k_XX.^(-d*a).*dist_k_YX.^(-d*b));
+

code/H_I_D/utilities/estimate_Ialpha.m

 [d,num_of_samples] = size(Y);
 squared_distances = kNN_squared_distances(Y,Y,co,1);
 
-V = pi^(d/2)/gamma(d/2+1);%volume of the d-dimensional unit ball
+V = volume_of_the_unit_ball(d);
 C = ( gamma(co.k)/gamma(co.k+1-co.alpha) )^(1/(1-co.alpha));
 s = sum( squared_distances(co.k,:).^(d*(1-co.alpha)/2) ); %'/2' <= squared distances
 I_alpha = (num_of_samples-1) / num_of_samples * V^(1-co.alpha) * C^(1-co.alpha) * s / (num_of_samples-1)^(co.alpha);

code/H_I_D/utilities/volume_of_the_unit_ball.m

+function [V] = volume_of_the_unit_ball(d)
+%Returns the volume (V) of the d-dimensional unit ball.
+%
+%Copyright (C) 2012 Zoltan Szabo ("http://nipg.inf.elte.hu/szzoli", "szzoli (at) cs (dot) elte (dot) hu")
+%
+%This file is part of the ITE (Information Theoretical Estimators) Matlab/Octave toolbox.
+%
+%ITE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by
+%the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
+%
+%This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of
+%MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for more details.
+%
+%You should have received a copy of the GNU General Public License along with ITE. If not, see <http://www.gnu.org/licenses/>.
+
+V = pi^(d/2) / gamma(d/2+1); % %= 2 * pi^(d/2) / ( d*gamma(d/2) );
Add a comment to this file

doc/ITE_documentation.pdf

Binary file modified.

Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.