This is the R and c code implementation of the method described in

Expectation Propagation for Microarray Data Classification
Daniel Hernandez Lobato, Jose Miguel Hernandez Lobato, and Alberto Suarez
Pattern Recognition Letters, 31, pp. 1618-1626, 2010.

In the folder code you can find the implementation of the method, with a 
simple example that runs the code on a train / test partition of the 
adenocarcinoma data set (note that the data is in R format, and has 
already been standardized to have zero mean and unit standard deviation). 
To run the example code you have to compile the .c file with

R CMD SHLIB ep_public_probit.c

After this, you can run the example with

R --no-save < simulate.R

that will write in the file errors_probit.txt the prediction error on the
test set. The sample code only considers a single train / test partition of 
the data.

The file data.tar.gz contains the data employed in the paper. There
is one folder per each dataset. Each folder contains a file X.txt with the
attributes in txt format and a file Y.txt with the labels, which take
value -1 and 1. 

To reproduce the results in the paper you should generate 50 random splits of
the data of each dataset with 2/3 of the data for training and 1/3 of the data
for testing. The data has to be normalized to have zero mean and unit standard
deviation on the training set. The data should be stored in R binary format, as
the train_1.dat and test_1.dat files that are contained in the code folder.

To make this task easy, we attach an R script called generate_splits.R that does 
this task. Simply copy that script to one of the folders and run 

R --no-save < generate_splits.R

Note that the train and test partitions may be different from the ones considered
in the paper. This means that the results obtained may be better but also worse due
to the randomness in the splitting.