Commits

Zoltán Szabó  committed eef4c82

Quick test on the Tsallis divergence: introduced. Pearson chi square divergence estimation in the exponential family (MLE + analytical formula): added.

  • Participants
  • Parent commits eea2190
  • Tags release-0.60

Comments (0)

Files changed (14)

File CHANGELOG.txt

-(May 20, 2014):
--Typo: discovered in 'DMMD_Ustat_estimation.m' -> 'd' stood instead of 'num_of_samplesY1'/'num_of_samplesY2' while making the diagonal of 'kY1Y1'/'kY2Y2' to be zero. Thanks to Rainer Kargel for noticing it; typo: corrected.
+v0.60 (June 2, 2014):
+-Quick test on the Tsallis divergence: introduced; see 'quick_test_DTsallis.m'.
+-Pearson chi square divergence estimation in the exponential family (MLE + analytical formula): added; see 'DChiSquare_expF_initialization.m', 'DChiSquare_expF_estimation.m'. 
+-Quick tests, computation of the log-normalizer and its gradient: updated with the new estimator; see 'quick_test_DChiSquare.m', 'quick_test_Dequality.m', 'expF_F.m', 'expF_gradF.m'.
+-MERR: verification of the existence of the AOD (aerosol optical depth prediction) generated dataset: added to 'generate_AOD_dataset.m'.
+-Typo: discovered in 'DMMD_Ustat_estimation.m'; 'd' stood instead of 'num_of_samplesY1'/'num_of_samplesY2' while making the diagonal of 'kY1Y1'/'kY2Y2' to be zero. Thanks to Rainer Kargel for noticing it; typo: corrected.
+-Note: strange activity on the 17-18^th May (20k downloads).
 
 v0.59 (May 16, 2014):
 -Adaptive partitioning based Shannon mutual information estimators: added; see (i) 'IShannon_AD2_initialization.m', IShannon_AD2_estimation.m, 'IShannon_AD_initialization.m', 'IShannon_AD_estimation.m', and (ii) 'shared/embedded/AP_MI' -- thanks to Petr Tichavsky and Zbynek Koldovsky for providing the implementations.
 
 **Download** the latest release: 
 
-- code: [zip](https://bitbucket.org/szzoli/ite/downloads/ITE-0.59_code.zip), [tar.bz2](https://bitbucket.org/szzoli/ite/downloads/ITE-0.59_code.tar.bz2), 
-- documentation: [pdf](https://bitbucket.org/szzoli/ite/downloads/ITE-0.59_documentation.pdf).
+- code: [zip](https://bitbucket.org/szzoli/ite/downloads/ITE-0.60_code.zip), [tar.bz2](https://bitbucket.org/szzoli/ite/downloads/ITE-0.60_code.tar.bz2), 
+- documentation: [pdf](https://bitbucket.org/szzoli/ite/downloads/ITE-0.60_documentation.pdf).

File code/estimators/base_estimators/DChiSquare_expF_estimation.m

+function [D] = DChiSquare_expF_estimation(Y1,Y2,co)
+%function [D] = DChiSquare_expF_estimation(Y1,Y2,co)
+%Estimates the Pearson chi-square divergence (D) of Y1 and Y2 using maximum likelihood estimation (MLE) + analytical formula associated to the chosen exponential family.
+%Assumption: 2 x np1 - np2 belongs to the natural space. This holds, for example, if the natural space is affine.
+%
+%We use the naming convention 'D<name>_estimation' to ease embedding new divergence estimation methods.
+%
+%INPUT:
+%  Y1: Y1(:,t) is the t^th sample from the first distribution.
+%  Y2: Y2(:,t) is the t^th sample from the second distribution. Note: the number of samples in Y1 [=size(Y1,2)] and Y2 [=size(Y2,2)] can be different.
+%  co: divergence estimator object.
+%
+%REFERENCE: 
+%    Frank Nielsen and Richard Nock. On the chi square and higher-order chi distances for approximating f-divergences. IEEE Signal Processing Letters, 2:10-13, 2014.
+
+%Copyright (C) 2012-2014 Zoltan Szabo ("http://www.gatsby.ucl.ac.uk/~szabo/", "zoltan (dot) szabo (at) gatsby (dot) ucl (dot) ac (dot) uk")
+%
+%This file is part of the ITE (Information Theoretical Estimators) toolbox.
+%
+%ITE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by
+%the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
+%
+%This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of
+%MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
+%
+%You should have received a copy of the GNU General Public License along with ITE. If not, see <http://www.gnu.org/licenses/>.
+
+%co.mult:OK. The information theoretical quantity of interest can be (and is!) estimated exactly [co.mult=1]; the computational complexity of the estimation is essentially the same as that of the 'up to multiplicative constant' case [co.mult=0]. In other words, the estimation is carried out 'exactly' (instead of up to 'proportionality').
+
+%verification:
+    if size(Y1,1) ~= size(Y2,1)
+        error('The dimension of the samples in Y1 and Y2 must be equal.');
+    end
+    
+%MLE:
+  np1 = expF_MLE(Y1,co.distr);
+  np2 = expF_MLE(Y2,co.distr);
+  
+D = expF_Dtemp4(co.distr,np1,np2,2,-1) - 1; %assumption: 2 * np1 - np2 belongs to the natural space. This holds, e.g., if the natural space is affine.
+    
+

File code/estimators/base_estimators/DChiSquare_expF_initialization.m

+function [co] = DChiSquare_expF_initialization(mult,post_init)
+%function [co] = DChiSquare_expF_initialization(mult)
+%function [co] = DChiSquare_expF_initialization(mult,post_init)
+%Initialization of the exponential family based Pearson chi-square divergence estimator (maximum likelihood + analytical formula associated to the chosen exponential family).
+%
+%Note:
+%   1)The estimator is treated as a cost object (co).
+%   2)We use the naming convention 'D<name>_initialization' to ease embedding new divergence estimation methods.
+%
+%INPUT:
+%   mult: is a multiplicative constant relevant (needed) in the estimation; '=1' means yes (='exact' estimation), '=0' no (=estimation up to 'proportionality').
+%   post_init: {field_name1,field_value1,field_name2,field_value2,...}; cell array containing the names and the values of the cost object fields that are to be used
+%   (instead of their default values). For further details, see 'post_initialization.m'.
+%OUTPUT:
+%   co: cost object (structure).
+
+%Copyright (C) 2012-2014 Zoltan Szabo ("http://www.gatsby.ucl.ac.uk/~szabo/", "zoltan (dot) szabo (at) gatsby (dot) ucl (dot) ac (dot) uk")
+%
+%This file is part of the ITE (Information Theoretical Estimators) toolbox.
+%
+%ITE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by
+%the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
+%
+%This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of
+%MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
+%
+%You should have received a copy of the GNU General Public License along with ITE. If not, see <http://www.gnu.org/licenses/>.
+
+%mandatory fields (following the template structure of the estimators to make uniform usage of the estimators possible):
+    co.name = 'ChiSquare_expF';
+    co.mult = mult;
+    
+%other fields:
+    co.distr = 'normalI'; %exponential family used for estimation; fixed; 'isotropic normal' [N(m,I)]
+
+%post initialization (put it _before_ initialization of the members in case of a meta estimator):    
+    if nargin==2 %there are given (name,value) cost object fields
+        co = post_initialization(co,post_init);
+    end     

File code/estimators/quick_tests/tests_analytical_vs_estimation/quick_test_DChiSquare.m

     num_of_samples_v = [1000:1000:50*1000]; %sample numbers used for estimation
     %estimator (for the chi^2 divergence), base:
         cost_name = 'ChiSquare_kNN_k'; %d>=1
+        %cost_name = 'ChiSquare_expF'; %d>=1 %distr = 'normalI'
         
 %initialization:    
     L = length(num_of_samples_v);    

File code/estimators/quick_tests/tests_analytical_vs_estimation/quick_test_DTsallis.m

+%function [] = quick_test_DTsallis()
+%Quick test for Tsallis divergence estimators: analytical expression vs estimated value as a function of the sample number. In the test, normal variables are considered.
+
+%Copyright (C) 2012-2014 Zoltan Szabo ("http://www.gatsby.ucl.ac.uk/~szabo/", "zoltan (dot) szabo (at) gatsby (dot) ucl (dot) ac (dot) uk")
+%
+%This file is part of the ITE (Information Theoretical Estimators) toolbox.
+%
+%ITE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by
+%the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
+%
+%This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of
+%MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
+%
+%You should have received a copy of the GNU General Public License along with ITE. If not, see <http://www.gnu.org/licenses/>.
+
+%clear start:
+    clear all; close all;
+    
+%parameters:
+    distr = 'normal'; %fixed
+    alpha_D = 0.99; %parameter of Tsallis divergence, \ne 1
+    d = 1; %dimension of the distribution
+    num_of_samples_v = [1000:1000:20*1000]; %sample numbers used for estimation
+    %estimator (of Tsallis divergence), base:
+        cost_name = 'Tsallis_kNN_k'; %d>=1
+    
+%initialization:
+    num_of_samples_max = num_of_samples_v(end);
+    L = length(num_of_samples_v);
+    co = D_initialization(cost_name,1,{'alpha',alpha_D}); %{'alpha',alpha_D}: set the 'alpha' field
+    D_hat_v = zeros(L,1); %vector of estimated Tsallis divergences
+
+%distr, d -> samples (Y1<<Y2), analytical formula for the Renyi divergence => Tsallis divergence (D_alpha):
+    switch distr
+        case 'normal'
+            %mean:
+                m2 = rand(d,1);
+                m1 = m2;
+            %(random) linear transformation applied to the data:
+                A2 = rand(d);
+                A1 = rand * A2; %(e2,A2) => (e1,A1) choice guarantees Y1<<Y2 (in practise, too)
+            %covariance matrix:
+                C1 = A1 * A1.';
+                C2 = A2 * A2.';
+            %generate samples:
+                Y1 = A1 * randn(d,num_of_samples_max) + repmat(m1,1,num_of_samples_max); %A1xN(0,I)+m1
+                Y2 = A2 * randn(d,num_of_samples_max) + repmat(m2,1,num_of_samples_max); %A2xN(0,I)+m2
+            %analytical value of Renyi divergence => Tsallis divergence:
+                par1.mean = m1; par1.cov = C1;
+                par2.mean = m2; par2.cov = C2;
+                D = analytical_value_DRenyi(distr,distr,alpha_D,par1,par2);
+                D = ( exp((alpha_D-1)*D) - 1 ) / (alpha_D - 1);
+        otherwise
+            error('Distribution=?');            
+    end
+    
+%estimation:
+    Tk = 0;%index of the sample number examined
+    for num_of_samples = num_of_samples_v
+        Tk = Tk + 1;
+        D_hat_v(Tk) = D_estimation(Y1(:,1:num_of_samples),Y2(:,1:num_of_samples),co);
+        disp(strcat('Tk=',num2str(Tk),'/',num2str(L)));
+    end
+
+%plot:
+    plot(num_of_samples_v,D_hat_v,'r',num_of_samples_v,D*ones(L,1),'g');
+    legend({'estimation','analytical value'});
+    xlabel('Number of samples');
+    ylabel('Tsallis divergence');    

File code/estimators/quick_tests/tests_distribution_regression/quick_test_AOD_prediction_nonlinearK.m

 %Quick test for aerosol optical depth (AOD) prediction. Method: MERR (Mean Embedding based Ridge Regression), using non-linear kernel (K) on mean embeddings.
 %Note: 
 %   1)The precomputed Gram matrices are saved (see 'dir_G').
-%   2)The code also supports 'Matlab: parallel computing', see below (matlabpool open/close, parfor-s ['precomputation_of_Gram_matrices_k', 'compute_slices.m']).
+%   2)The code also supports 'Matlab: parallel computing', see below (matlabpool open/close, parfor-s ['precomputation_of_Gram_matrices_kK', 'compute_slices.m']).
 %
 %REFERENCE: Zoltan Szabo, Arthur Gretton, Barnabas Poczos, and Bharath Sriperumbudur. Consistent, two-stage sampled distribution regression via mean embedding. Technical report, Gatsby Unit, University College London, 2014. (http://arxiv.org/abs/1402.1754).
 
     end%nCV
 end%v
 %-------------------------------
-%matlabpool close; %use this line with 'Matlab: parallel computing'
+%matlabpool close; %use this line with 'Matlab: parallel computing'

File code/estimators/quick_tests/tests_other_consistency/quick_test_Dequality.m

     clear all; close all;
     
 %parameters: 
-    distr = 'uniform'; %possibilities: 'normal', 'uniform'
+    distr = 'uniform'; %possibilities: 'normal', 'uniform', 'normalI'
     d = 1; %dimension of the distribution
     num_of_samples_v = [1000:1000:30*1000]; %sample numbers used for estimation
     %method used to measure divergence: 
              %cost_name = 'MMD_Vstat_iChol'; %d>=1
              %cost_name = 'KL_PSD_SzegoT';   % d=1
              %cost_name = 'KL_expF';         %d>=1
+             %cost_name = 'ChiSquare_expF';  %d>=1; distr = 'normalI'
 
         %meta:
              %cost_name = 'Jdistance';       %d>=1 
                 m = rand(d,1); A = rand(d);
                 Y1 = A*randn(d,num_of_samples_max) + repmat(m,1,num_of_samples_max);%AxN(0,I)+m
                 Y2 = A*randn(d,num_of_samples_max) + repmat(m,1,num_of_samples_max);%AxN(0,I)+m
+            case 'normalI'
+                m = 2*rand(d,1);
+                Y1 = randn(d,num_of_samples_max) + repmat(m,1,num_of_samples_max); %N(m,I)
+                Y2 = randn(d,num_of_samples_max) + repmat(m,1,num_of_samples_max); %N(m,I)
             otherwise
                 error('Distribution=?');        
         end

File code/estimators/utilities/MERR/generate_AOD_dataset.m

     dir_present = create_and_cd_dir(dir_experiment);
 
 %generate dataset (X,Y):
+    FN = strcat(dataset,'_dataset.mat');
+    %verification:
+        if exist(FN)
+            error('The dataset already exists.');
+        end
     %load data:    
         data = load(strcat(dataset,'.mat')); %the result is a structure
         data = data.(dataset); %extract the data from the structure

File code/estimators/utilities/analytical_values/analytical_value_DChiSquare.m

     a = par1.a;
     b = par2.a;
     D = prod(b)/prod(a) - 1;
-elseif strcmp(distr1,'normalI') && strcmp(distr2,'normalI') %Frank Nielsen. Closed-form information-theoretic divergences for statistical mixtures. In International Conference on Pattern Recognition (ICPR), pages 1723-1726, 2012.
+elseif strcmp(distr1,'normalI') && strcmp(distr2,'normalI') %Frank Nielsen and Richard Nock. On the chi square and higher-order chi distances for approximating f-divergence. IEEE Signal Processing Letters, 2:10-13, 2014. 
     m1 = par1.mean;
     m2 = par2.mean;
     diffm = m2 - m1;
     D = exp(diffm.' * diffm) - 1;  
-
 else
     error('Distribution=?'); 
 end

File code/estimators/utilities/exp_family/expF_Dtemp4.m

+function [Dtemp4] = expF_Dtemp4(distr,np1,np2,a,b)
+%function [Dtemp4] = expF_Dtemp4(distr,np1,np2,a,b)
+%Computes Dtemp4 = \int [f_1(u)]^a [f_2(u)]^b du in the chosen exponential family (distr), where np1 and np2 are the natural parameters associated to f_1 and f_2, respectively.
+%Assumption: a + b = 1 and 'a * np1 + b * np2' belongs to the natural space. This holds, for example, if the natural space is affine. 
+%
+%REFERENCE: 
+%   Frank Nielsen and Richard Nock. On the chi square and higher-order chi distances for approximating f-divergences. IEEE Signal Processing Letters, 2:10–13, 2014.
+
+%Copyright (C) 2012-2014 Zoltan Szabo ("http://www.gatsby.ucl.ac.uk/~szabo/", "zoltan (dot) szabo (at) gatsby (dot) ucl (dot) ac (dot) uk")
+%
+%This file is part of the ITE (Information Theoretical Estimators) toolbox.
+%
+%ITE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by
+%the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
+%
+%This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of
+%MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
+%
+%You should have received a copy of the GNU General Public License along with ITE. If not, see <http://www.gnu.org/licenses/>.
+
+%D ingredients:
+    F0 = expF_F(distr,expF_np1_np2_add(expF_np_mult(np1,a), expF_np_mult(np2,b))); %F(a*np1 + b*np2)
+    F1 = a * expF_F(distr,np1); %a*F(np1)
+    F2 = b * expF_F(distr,np2); %b*F(np2)
+
+Dtemp4 = exp(F0 - (F1+F2));

File code/estimators/utilities/exp_family/expF_F.m

 function [F] = expF_F(distr,np) 
 %function [F] = expF_F(distr,np) 
-%Computes the log-normalizer (F) at a given natural parameter value (np) for the input exponential family. See also 'expF_MLE.m'.
+%Computes the log-normalizer (F) at a given natural parameter value (np) for the input exponential family.
 %
 %INPUT:
-%   distr: 'normal'.
+%   distr: 'normal', 'normalI'.
 %   np   : natural parameters.
 %          distr = 'normal': np.t1 = C^{-1}*m, np.t2 = 1/2*C^{-1}, where m is the mean, C is the covariance matrix.
+%                  'normalI': no.m = m, where m is the mean.  
 
 %Copyright (C) 2012-2014 Zoltan Szabo ("http://www.gatsby.ucl.ac.uk/~szabo/", "zoltan (dot) szabo (at) gatsby (dot) ucl (dot) ac (dot) uk")
 %
     case 'normal' %Ref: Frank Nielsen, Vincent Garcia. Statistical exponential families: A digest with flash cards. "http://arxiv.org/abs/0911.4863"
         d = length(np.t1);
         F = trace(inv(np.t2) * np.t1 * np.t1.') /4 - log(det(np.t2)) / 2 + d * log(pi) / 2;
+    case 'normalI'
+        F = sum((np.m).^2) / 2;
     otherwise
        error('Distribution=?');   
 end

File code/estimators/utilities/exp_family/expF_MLE.m

         invC = inv(C);
         np.t1 = invC * m;
         np.t2 = invC / 2;
+    case 'normalI' 
+        np.m = mean(Y,2);
     otherwise
        error('Distribution=?');                 
 end

File code/estimators/utilities/exp_family/expF_gradF.m

 %Computes the gradient of the log-normalizer (gradF) at a given natural parameter value (np) for the input exponential family.
 %
 %INPUT:
-%   distr: 'normal'.
+%   distr: 'normal', or 'normalI'.
 %   np   : natural parameters.
-%          distr = 'normal': np.t1 = C^{-1}*m, np.t2 = 1/2*C^{-1}, where m is the mean, C is the covariance matrix.
+%          distr = 'normal': np.t1 = C^{-1}*m, np.t2 = 1/2*C^{-1}, where mis the mean, C is the covariance matrix,
+%                  'normalI': np.m = m, where m is the mean. 
 
 %Copyright (C) 2012-2014 Zoltan Szabo ("http://www.gatsby.ucl.ac.uk/~szabo/", "zoltan (dot) szabo (at) gatsby (dot) ucl (dot) ac (dot) uk")
 %
         s = I * np.t1;
         gradF.t1 = s / 2;
         gradF.t2 = -I/2 - s * s.' / 4;
+    case 'normalI'
+	gradF.m = np.m;
     otherwise
        error('Distribution=?');   
-end
+end