Zoltan Szabo avatar Zoltan Szabo committed 1fb133f

Monte-Carlo simulation to compute the additive constants in Renyi entropy estimation: added; see 'estimate_HRenyi_constant.m'. compute_length_HRenyi_MST.m: pdist -> sqdistance (acceleration).

Comments (0)

Files changed (15)

--kNN_squared_distances.m, co.kNNmethod='knnFP2': for compatibility reasons variable 'indices' has been converted to int32
--accents: deleted from comments.
+v0.14 (Oct 29, 2012):
+-Monte-Carlo simulation to compute the additive constants in Renyi entropy estimation: added; see 'estimate_HRenyi_constant.m'.
+-compute_length_HRenyi_MST.m: pdist -> sqdistance (acceleration).
+-The embedded 'knnFP1' method can produce an '1e-15' rounding error in squared_distances => W is not _perfectly_ sym. (in the verification of compute_MST.m:kruskal_mst.m), correction made; see 'compute_length_HRenyi_GSF.m'.
+-kNN_squared_distances.m, co.kNNmethod='knnFP2': for compatibility reasons variable 'indices' has been converted to int32.
 
 v0.13 (Oct 27, 2012):
 -Tsallis entropy is now available in ITE; it can be estimated via k-nearest neighbors, see 'HTsallis_kNN_k_initialization.m', 'HTsallis_kNN_k_estimation.m'.

code/H_I_D/base_estimators/HRenyi_GSF_estimation.m

 %
 %You should have received a copy of the GNU General Public License along with ITE. If not, see <http://www.gnu.org/licenses/>.
 
-[d,num_of_samples] = size(Y);
+%co.mult:OK.
 
-%compute kNN graph (S={1,...,k}):
-    [squared_distances,I] = kNN_squared_distances(Y,Y,co,1);%I:int32
-    
-%kNN relations -> weighted kNN graph (W):
-    J = repmat(int32([1:num_of_samples]),co.k,1);%double->int32
-    D = squared_distances(:).^(d*(1-co.alpha));
-    W = spalloc(num_of_samples,num_of_samples,2*num_of_samples*co.k);
-    W(I+(J-1)*num_of_samples) = D;
-    W(J+(I-1)*num_of_samples) = D;
+[d,num_of_samples] = size(Y);%dimension, number of samples
 
-%W->L (using MatlabBGL); minimal spanning forest, and its weight (L): 
-    L = compute_MST(W,co.GSFmethod);
+%length (L):
+    L = compute_length_HRenyi_GSF(Y,co);
 
 %estimation:
-    H = log(L/num_of_samples.^co.alpha) / (1-co.alpha);
+    if co.additive_constant_is_relevant %the additive constant is relevant in the Renyi entropy estimation
+        FN = filename_of_HRenyi_constant(d,co);
+        if exist(FN)
+            load(FN,'constant');
+            H = log( L / (constant*num_of_samples^co.alpha) ) / (1-co.alpha);
+        else
+            disp('Error: the file containing the additive constant does not exist. You can precompute the additive constant via estimate_HRenyi_constant.m.');
+        end
+    else %estimation up to an additive constant
+        H = log( L / num_of_samples^co.alpha ) / (1-co.alpha);
+    end

code/H_I_D/base_estimators/HRenyi_GSF_initialization.m

         co.GSFmethod = 'MatlabBGL_Kruskal';
 
     co.alpha = 0.99; %The Renyi entropy equals to the Shannon differential entropy, in limit, i.e., Renyi=H_{R,alpha} -> Shannon=H, provided that alpha ->1.
-
+    co.additive_constant_is_relevant = 0; %1:additive constant is relevant (you can precompute it via 'estimate_HRenyi_constant.m'), 0:not relevant
+    
 %initialize the ann wrapper in Octave, if needed:
     initialize_Octave_ann_wrapper_if_needed(co.kNNmethod);
     

code/H_I_D/base_estimators/HRenyi_MST_estimation.m

 %
 %You should have received a copy of the GNU General Public License along with ITE. If not, see <http://www.gnu.org/licenses/>.
 
-[d,num_of_samples] = size(Y);
-gam = d * (1-co.alpha);
-W = squareform(pdist(Y.')).^gam; %[ (||Y(:,i)-Y(:,j)||_2).^gam ]
-
-%W->L:
-    L = compute_MST(W,co.MSTmethod);
-    
 %co.mult:OK.
 
-%estimation up to an additive constant:
-    H = log(L/num_of_samples^(co.alpha)) / (1-co.alpha);
+[d,num_of_samples] = size(Y);%dimension, number of samples
 
+%length (L):
+    L = compute_length_HRenyi_MST(Y,co);
+
+%estimation:
+    if co.additive_constant_is_relevant %the additive constant is relevant in the Renyi entropy estimation
+        FN = filename_of_HRenyi_constant(d,co);
+        if exist(FN)
+            load(FN,'constant');
+            H = log( L / (constant*num_of_samples^co.alpha) ) / (1-co.alpha);
+        else
+            disp('Error: the file containing the additive constant does not exist. You can precompute the additive constant via estimate_HRenyi_constant.m.');
+        end
+    else %estimation up to an additive constant
+        H = log( L / num_of_samples^co.alpha ) / (1-co.alpha);
+    end

code/H_I_D/base_estimators/HRenyi_MST_initialization.m

         %co.MSTmethod = 'MatlabBGL_Kruskal';
         %co.MSTmethod = 'pmtk3_Prim';
         %co.MSTmethod = 'pmtk3_Kruskal';    
+    co.additive_constant_is_relevant = 0; %1:additive constant is relevant (you can precompute it via 'estimate_HRenyi_constant.m'), 0:not relevant

code/H_I_D/base_estimators/HRenyi_kNN_1tok_estimation.m

 function [H] = HRenyi_kNN_1tok_estimation(Y,co)
-%Estimates the Renyi entropy (H) of Y (Y(:,t) is the t^th sample)
-%using the kNN method (S={1,...,k}). Cost parameters are provided in the cost object co.
+%Estimates the Renyi entropy (H) of Y (Y(:,t) is the t^th sample) using the kNN method (S={1,...,k}). Cost parameters are provided in the cost object co.
 %
 %We make use of the naming convention 'H<name>_estimation', to ease embedding new entropy estimation methods.
 %
 %
 %You should have received a copy of the GNU General Public License along with ITE. If not, see <http://www.gnu.org/licenses/>.
 
-[d,num_of_samples] = size(Y);
-squared_distances = kNN_squared_distances(Y,Y,co,1);
-gam = d * (1-co.alpha);
-
 %co.mult:OK.
 
-%estimation up to an additive constant:
-    L = sum(sum(squared_distances.^(gam/2),1));
-    H = log( L / num_of_samples^co.alpha ) / (1-co.alpha);
+[d,num_of_samples] = size(Y);%dimension, number of samples
 
+%length (L):
+    L = compute_length_HRenyi_kNN_1tok(Y,co);
 
+%estimation:
+    if co.additive_constant_is_relevant %the additive constant is relevant in the Rényi entropy estimation
+        FN = filename_of_HRenyi_constant(d,co);
+        if exist(FN)
+            load(FN,'constant');
+            H = log( L / (constant*num_of_samples^co.alpha) ) / (1-co.alpha);
+        else
+            disp('Error: the file containing the additive constant does not exist. You can precompute the additive constant via estimate_HRenyi_constant.m.');
+        end
+    else %estimation up to an additive constant
+        H = log( L / num_of_samples^co.alpha ) / (1-co.alpha);
+    end
+  
+    
 
 
 
+
+
+

code/H_I_D/base_estimators/HRenyi_kNN_1tok_initialization.m

             %co.epsi = 0; %=0: exact kNN; >0: approximate kNN, the true (not squared) distances can not exceed the real distance more than a factor of (1+epsi).
 
     co.alpha = 0.99; %The Renyi entropy equals to the Shannon differential entropy, in limit, i.e., Renyi=H_{R,alpha} -> Shannon=H, provided that alpha ->1.
-
+    co.additive_constant_is_relevant = 0; %1:additive constant is relevant (you can precompute it via 'estimate_HRenyi_constant.m'), 0:not relevant
+    
 %initialize the ann wrapper in Octave, if needed:
     initialize_Octave_ann_wrapper_if_needed(co.kNNmethod);
     

code/H_I_D/base_estimators/HRenyi_kNN_S_estimation.m

 %
 %You should have received a copy of the GNU General Public License along with ITE. If not, see <http://www.gnu.org/licenses/>.
 
-[d,num_of_samples] = size(Y);
-squared_distances = kNN_squared_distances(Y,Y,co,1);
-p = d * (1-co.alpha);
-
 %co.mult:OK.
 
-%estimation up to an additive constant:
-    L = sum(sum(squared_distances(co.k,:).^(p/2),1)); %'p/2' <= squared; S=co.k
-    H = log( L / num_of_samples^co.alpha) / (1-co.alpha); 
+[d,num_of_samples] = size(Y);%dimension, number of samples
+
+%length (L):
+    L = compute_length_HRenyi_kNN_S(Y,co);
+
+%estimation:
+    if co.additive_constant_is_relevant %the additive constant is relevant in the Rényi entropy estimation
+        FN = filename_of_HRenyi_constant(d,co);
+        if exist(FN)
+            load(FN,'constant');
+            H = log( L / (constant*num_of_samples^co.alpha) ) / (1-co.alpha);
+        else
+            disp('Error: the file containing the additive constant does not exist. You can precompute the additive constant via estimate_HRenyi_constant.m.');
+        end
+    else %estimation up to an additive constant
+        H = log( L / num_of_samples^co.alpha ) / (1-co.alpha);
+    end
+  

code/H_I_D/base_estimators/HRenyi_kNN_S_initialization.m

             %co.epsi = 0; %=0: exact kNN; >0: approximate kNN, the true (not squared) distances can not exceed the real distance more than a factor of (1+epsi).
 				
     co.alpha = 0.99; %The Renyi entropy equals to the Shannon differential entropy, in limit, i.e., Renyi=H_{R,alpha} -> Shannon=H, provided that alpha ->1.
+    co.additive_constant_is_relevant = 0; %1:additive constant is relevant (you can precompute it via 'estimate_HRenyi_constant.m'), 0:not relevant    
     
 %initialize the ann wrapper in Octave, if needed:
     initialize_Octave_ann_wrapper_if_needed(co.kNNmethod);

code/H_I_D/utilities/compute_length_HRenyi_GSF.m

+function [L] = compute_length_HRenyi_GSF(Y,co)
+%Computes the length (L) associated to the 'Renyi_GSF' Renyi entropy estimator.
+%
+%INPUT:
+%    Y: Y(:,t) is the t^th sample.
+%   co: cost object.
+%
+%Copyright (C) 2012 Zoltan Szabo ("http://nipg.inf.elte.hu/szzoli", "szzoli (at) cs (dot) elte (dot) hu")
+%
+%This file is part of the ITE (Information Theoretical Estimators) Matlab/Octave toolbox.
+%
+%ITE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by
+%the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
+%
+%This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of
+%MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for more details.
+%
+%You should have received a copy of the GNU General Public License along with ITE. If not, see <http://www.gnu.org/licenses/>.
+
+[d,num_of_samples] = size(Y);
+
+%compute kNN graph (S={1,...,k}):
+    [squared_distances,I] = kNN_squared_distances(Y,Y,co,1);%I:int32
+    
+%kNN relations -> weighted kNN graph (W):
+    J = repmat(int32([1:num_of_samples]),co.k,1);%double->int32
+    D = squared_distances(:).^(d*(1-co.alpha));
+    W = spalloc(num_of_samples,num_of_samples,2*num_of_samples*co.k);
+    W(I+(J-1)*num_of_samples) = D;
+    W(J+(I-1)*num_of_samples) = D;
+    %The result obtained by co.kNNmethod = 'knnFP1' (squared distances) may contain an '1e-15' rounding error which
+    %can cause W to be not _perfectly_ sym. This '1e-15' difference must be/is corrected below:
+        if strcmp(co.kNNmethod,'knnFP1')
+            W = W + W.';
+        end
+
+%W->L (using MatlabBGL); minimal spanning forest, and its weight (L): 
+    L = compute_MST(W,co.GSFmethod);

code/H_I_D/utilities/compute_length_HRenyi_MST.m

+function [L] = compute_length_HRenyi_MST(Y,co)
+%Computes the length (L) associated to the 'Renyi_MST' Renyi entropy estimator.
+%
+%INPUT:
+%    Y: Y(:,t) is the t^th sample.
+%   co: cost object.
+%
+%Copyright (C) 2012 Zoltan Szabo ("http://nipg.inf.elte.hu/szzoli", "szzoli (at) cs (dot) elte (dot) hu")
+%
+%This file is part of the ITE (Information Theoretical Estimators) Matlab/Octave toolbox.
+%
+%ITE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by
+%the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
+%
+%This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of
+%MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for more details.
+%
+%You should have received a copy of the GNU General Public License along with ITE. If not, see <http://www.gnu.org/licenses/>.
+
+d = size(Y,1);
+gam = d * (1-co.alpha);
+W = sqdistance(Y).^(gam/2); %[ (||Y(:,i)-Y(:,j)||_2).^gam ]
+
+%W->L:
+    L = compute_MST(W,co.MSTmethod);

code/H_I_D/utilities/compute_length_HRenyi_kNN_1tok.m

+function [L] = compute_length_HRenyi_kNN_1tok(Y,co)
+%Computes the length (L) associated to the 'Renyi_kNN_1tok' Renyi entropy estimator.
+%
+%INPUT:
+%    Y: Y(:,t) is the t^th sample.
+%   co: cost object.
+%
+%Copyright (C) 2012 Zoltan Szabo ("http://nipg.inf.elte.hu/szzoli", "szzoli (at) cs (dot) elte (dot) hu")
+%
+%This file is part of the ITE (Information Theoretical Estimators) Matlab/Octave toolbox.
+%
+%ITE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by
+%the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
+%
+%This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of
+%MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for more details.
+%
+%You should have received a copy of the GNU General Public License along with ITE. If not, see <http://www.gnu.org/licenses/>.
+
+d = size(Y,1);
+squared_distances = kNN_squared_distances(Y,Y,co,1);
+gam = d * (1-co.alpha);
+L = sum(sum(squared_distances.^(gam/2),1));

code/H_I_D/utilities/compute_length_HRenyi_kNN_S.m

+function [L] = compute_length_HRenyi_kNN_S(Y,co)
+%Computes the length (L) associated to the 'Renyi_kNN_S' Renyi entropy estimator.
+%
+%INPUT:
+%    Y: Y(:,t) is the t^th sample.
+%   co: cost object.
+%
+%Copyright (C) 2012 Zoltan Szabo ("http://nipg.inf.elte.hu/szzoli", "szzoli (at) cs (dot) elte (dot) hu")
+%
+%This file is part of the ITE (Information Theoretical Estimators) Matlab/Octave toolbox.
+%
+%ITE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by
+%the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
+%
+%This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of
+%MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for more details.
+%
+%You should have received a copy of the GNU General Public License along with ITE. If not, see <http://www.gnu.org/licenses/>.
+
+d = size(Y,1);
+squared_distances = kNN_squared_distances(Y,Y,co,1);
+gam = d * (1-co.alpha);
+L = sum(sum(squared_distances(co.k,:).^(gam/2),1)); %'gam/2' <= squared; S=co.k

code/H_I_D/utilities/estimate_HRenyi_constant.m

+%function [] = estimate_HRenyi_constant()
+%Estimates the additive constants of the Renyi entropy estimators 'Renyi_kNN_1tok', 'Renyi_kNN_S', 'Renyi_MST' and 'Renyi_GSF', for a fixed dimension (d below). In certain cases these constants can also be important. Here, they are precomputed via Monte-Carlo simulations; the result is saved and becomes available in 'HRenyi_kNN_1tok_estimation.m', 'HRenyi_kNN_S_estimation.m', 'HRenyi_MST_estimation.m' and 'HRenyi_GSF_estimation.m'.
+%
+%Copyright (C) 2012 Zoltan Szabo ("http://nipg.inf.elte.hu/szzoli", "szzoli (at) cs (dot) elte (dot) hu")
+%
+%This file is part of the ITE (Information Theoretical Estimators) Matlab/Octave toolbox.
+%
+%ITE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by
+%the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
+%
+%This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of
+%MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for more details.
+%
+%You should have received a copy of the GNU General Public License along with ITE. If not, see <http://www.gnu.org/licenses/>.
+
+%clear start:
+    clear all;
+
+%Parameters:
+    %name of the Renyi entropy estimator:
+        cost_name = 'Renyi_kNN_1tok';
+        %cost_name = 'Renyi_kNN_S';
+        %cost_name = 'Renyi_MST';
+        %cost_name = 'Renyi_GSF';
+    %dimension:
+        d = 1;
+    num_of_samples = 10000;%number of samples for a fixed run
+    num_of_MC_runs = 50;   %number of Monte-Carlo runs
+        
+%initialize cost object:
+    mult = 1;
+    co = H_initialization(cost_name,mult);%the value of 'mult' is irrelevant (=not used) below.
+
+switch cost_name
+    case {'Renyi_kNN_1tok','Renyi_kNN_S','Renyi_MST','Renyi_GSF'}
+        %initialization:
+            alp = co.alpha; 
+            gam = d * (1-alp);
+            sum_of_normalizedL = 0;      
+
+        for k = 1 : num_of_MC_runs
+            Y = rand(d,num_of_samples);%uniform distribution, U([0,1]^d)
+            %L(Y):
+                eval(['L=compute_length_H',co.name,'(Y,co);']);
+                    %L = compute_length_HRenyi_kNN_1tok(Y,co);
+		    %L = compute_length_HRenyi_kNN_S(Y,co);
+                    %L = compute_length_HRenyi_MST(Y,co);
+                    %L = compute_length_HRenyi_GSF(Y,co);
+            sum_of_normalizedL = sum_of_normalizedL + L / (num_of_samples^(1-gam/d) ); 
+            if mod(k,10)==1
+                disp(strcat('k=',num2str(k),'(/',num2str(num_of_MC_runs),')'));
+            end
+        end
+
+        constant = sum_of_normalizedL / num_of_MC_runs;
+        FN = filename_of_HRenyi_constant(d,co);
+        save(FN,'constant','num_of_samples','num_of_MC_runs','co');
+    otherwise
+        disp('Error: cost name=?');
+end

code/H_I_D/utilities/filename_of_HRenyi_constant.m

+function [FN] = filename_of_HRenyi_constant(d,co)
+%Returns the filename (FN) of the precomputed additive constants for Renyi entropy estimators 'Renyi_kNN_1tok', 'Renyi_kNN_S', 'Renyi_MST' and 'Renyi_GSF'. See 'estimate_HRenyi_constant.m'.
+%
+%INPUT:
+%   co: cost object.
+%   d: dimension.
+%
+%Copyright (C) 2012 Zoltan Szabo ("http://nipg.inf.elte.hu/szzoli", "szzoli (at) cs (dot) elte (dot) hu")
+%
+%This file is part of the ITE (Information Theoretical Estimators) Matlab/Octave toolbox.
+%
+%ITE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by
+%the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
+%
+%This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of
+%MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for more details.
+%
+%You should have received a copy of the GNU General Public License along with ITE. If not, see <http://www.gnu.org/licenses/>.
+
+switch co.name
+    case 'Renyi_kNN_1tok'
+        FN = strcat('constant_H',co.name,'_d',num2str(d),'_k',num2str(co.k),'_alpha',num2str(co.alpha),'.mat');%co.kNNmethod (and its parameters) are not saved to the filename
+    case 'Renyi_kNN_S'
+        FN = strcat('constant_H',co.name,'_d',num2str(d),'_k',Sstr(co.k),'_alpha',num2str(co.alpha),'.mat');%co.kNNmethod (and its parameters) are not saved to the filename
+    case 'Renyi_MST'
+        FN = strcat('constant_H',co.name,'_d',num2str(d),'_alpha',num2str(co.alpha),'.mat'); %co.MSTmethod is not saved to the filename
+    case 'Renyi_GSF'
+        FN = strcat('constant_H',co.name,'_d',num2str(d),'_k',num2str(co.k),'_alpha',num2str(co.alpha),'.mat');%co.kNNmethod (and its parameters), co.GSFmethod are not saved to the filename
+    otherwise
+        disp('Error: cost name=?');
+end
+
+function [str] = Sstr(ks)
+%Returns the string version (str) of the k values given in vector 'ks'. Example: ks = [1,2,4] -> str='1_2_4'.
+
+L = length(ks);
+str = '';
+for j = 1 : L
+    str = strcat(str,num2str(ks(j)));
+    if j<L
+        str = strcat(str,'_');
+    end
+end
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.