Clone wiki

FPM / Home

Frequent Pattern Mining using Row Enumeration

This resource provides C++ implementations of the MaxConf and RerII algorithms described in the following papers:

Tara McIntosh, Sanjay Chawla,
"High Confidence Rule Mining for Microarray Analysis,"
IEEE/ACM Transactions on Computational Biology and Bioinformatics,
vol. 4, no. 4, pp. 611-623, Oct-Dec, 2007

G. Cong, K.-L. Tan, A. Tung, and F. Pan,
"Mining Frequent Closed Patterns in Microarray Data,"
Proc. Fourth IEEE Int'l Conf. Data Mining (ICDM), vol. 4, pp. 363-366, 2004.

This software is covered by a non-commercial use licence. See LICENCE.txt for the full text of the licence.

Building FPM tools

Building the tools simply involves running make. This will create a binary called fpm, which is used to run the MaxConf and RerII algorithms.

% make

Dependencies

Boost program_options library
Boost filesystem library
Boost system library

Running the FPM tools

Command line arguments

FPM requires command line arguments to run. If no arguments are provided, the program will return a help message describing all of the command line arguments.

% ./fpm --help

ArgumentDefault valueDescription/Options
--helpproduce help message
-a [ --alg ] arg(=maxconf)FPM algorithm: maxconf or rerII
-c [ --conf ] arg(=0.80)min confidence threshold
-s [ --supp ] arg(=0.10)min support threshold
-i [ --input ] argtransaction input file
-o [ --output ] argoutput file

% ./fpm -a maxconf -c 0.8 -i example.trans -o results.out

% ./fpm -a rerII -s 0.1 -c 0.8 -i example.trans -o results.out

Input/Output formats

The input format contains an individual transaction on each line, where each item is represented by an integer starting from 0. Repeated items are ignored.

For example:

0 1 20 35 401 500 666 ...
1 8 900 60 50 2000 ...
1 20 500 606 ...

The output format is as follows:

<support> <confidence> TAB <single antecedent> : <Frequent closed itemset including antecedent>

For example:

0.037 1 2618 : 2218 2618
0.73 0.948052 4407 : 3309 4407 4408
0.73 0.950000 4408 : 3309 4407 4408

contains the rules:

2618 -> 2218 (100% confidence and 3.7% support)
4407 -> 3309 4408 (94.8% confidence and 73% support)
4408 -> 3309 4407 (95.0% confidence and 73% support)

Example transaction dataset

A file called example.trans, which contains the example transactions from Table 1a in McIntosh and Chawla (2007), is provided.

To obtain the results in Table 1b:

% ./fpm -a maxconf -c 0.66 -i example.trans -o results.out

% ./fpm -a rerII -s 0.375 -c 0.66 -i example.trans -o results.out

NOTE: Corrections for Table 1b in McIntosh and Chawla (2007)

1. In Table legend, Minconf should be >= 2/3
2. Rule B-> CDG (conf. 3/3 and supp. 3) should read B-> ACDG (conf. 2/3 and supp. 2)

Updated