DGGA V. 0.1.4

We changed the license back to GNU GPLv3.

DGGA v. 0.1.3 

Removing the target algorithm cutoff when performing quality tuning.

DGGA v. 0.1.2 

Fixing a bug in the reading of ParamILS/SMAC scenario files in which the values
of the file were not correctly interpreted. Thank you to Marius Lindauer for
pointing out this problem.

DGGA v. 0.1.1

===> Issuing DGGA an MIT license to all GGA code to allow the project to be completely licensed under the MIT license.
Note: This license was rescinded on April 12, 2017, as we may not have been allowed to switch to the MIT license. All versions of GGA between 0.1.1 and 0.1.4 are hereby GNU GPLv3.

DGGA v. 0.1

Author: Josep Pon Farreny
Based on GGA v. 1.3.2 by Kevin Tierney (see below)

GGA v. 1.3.2
Author: Kevin Tierney


This is a re-implementation of GGA, which was first introduced in:

Carlos Ansotegui Gil, Meinolf Sellmann, Kevin Tierney
A Gender-Based Genetic Algorithm for the Automatic Configuration of Algorithms
Proceedings of the 15th intern. Conference on the Principles and Practice of Constraint Programming (CP-09), Springer LNCS 5732, pp. 142-157, 2009.

The exact code from that paper will never be released due to Brown University's
policies. This version of GGA was created by Kevin Tierney after his master's
thesis was completed and he was no longer a student of Brown. While this
version is similar in many ways, and the core algorithm is true to the one in
the paper, differences exist between this code and the original version. These
differences may result in different results than in the paper. 

Furthermore, this code is provided on an as-is basis under the GNU GPL v3
license, and may only be used for academic purposes.

All code and documentation is Copyright 2010-2012 Kevin Tierney. Thanks to Yuri
Malitsky for his contributions (implementing par10, fixing bugs).

Citing this work

If you use this work, we ask that you cite it as follows:

Carlos Ansotegui Gil, Meinolf Sellmann, Kevin Tierney
A Gender-Based Genetic Algorithm for the Automatic Configuration of Algorithms
Proceedings of the 15th intern. Conference on the Principles and Practice of Constraint Programming (CP-09), Springer LNCS 5732, pp. 142-157, 2009.


GGA is a general parameter tuner which supports both algorithm runtime and
performance tuning, as well as categorical, integer and floating point
algorithm parameters. For information about the GGA algorithm, please read the
paper in CP 2009.

Practical information:

GGA is a parallel algorithm and will run as many threads as it is told to
simultaneously. While GGA supports being run in a single threaded mode (i.e.
with a tournament size of 1), we don't recommend doing this. In practice, at
least 4 threads are necessary for good performance, but 8 or more are even
better. Each CPU should receive no more than two threads, although a single
thread per CPU is ideal. This is because GGA is a racing algorithm, so the more
threads that are available the less total CPU time the algorithm will need to
perform its tuning.


There is currently only a *nix version of GGA. No plans exist to create a
Windows version. Sorry.

GGA depends on the libxml2 and boost regular expression libraries. You may need
to modify the Makefile to point to those libraries on your system (lines 38 and 39)

To build GGA, type:



Parameter settings

I recommend running GGA with the following parameters:

./gga <parameter tree xml> <instance seed file> -p 100 -g 100 --gf 75 --is 5 -t <number of cores available> --ie <total number of instances> --tacl <target algorithm timeout> --tc <tuner CPU cutoff> --pe <penalty> -v 5 >& <output>

An explanation follows:

<parameter tree xml> - The XML file defining the parameter tree. See below for an explanation of the XML, or examples/clasp/clasp.xml

<instance seed file> - A file containing instance seed pairs with each line of the form: seed filename

-p 100 - The initial population size. Note that this can vary over the course of the algorithm, but generally doesn't stray more than 25%.

-g 100 - 100 generations are recommended, although as few as 50 can be sufficient.

--gf 75 --is 5 --ie <total number of instances> - These parameters control how
    many instances GGA tunes on in each generation. -is 5 means GGA starts with 5
    instances and moves to -ie instances by the 75th (-gf) generation. After the
    75th generation, all instances are used. I recommend always leaving several
    generations with all instances before GGA's termination for good performance.

-t <number of cores available> - This sets the size of GGA's tournaments. This number of target algorithms will be run simultaneously, so the maximum value should be the number of cores available.

--tacl <timeout> - The timeout of running the target algorithm on a particular instance.

--tc <tuner CPU cutoff> - Maximum CPU time to run GGA for. Note that we also offer --twc if you want to limit GGA in terms of wall-clock time.

--pe <penalty> - Set to 10 for PAR10 scoring, set to 1 for no PAR scoring.

-v 5 - Log verbosity level 5 prints out lots of information, much of which will be useful for experiments. If desired, level 1 or 3 can be used to keep log files small.

>& <output> - The output file of GGA.

An example

I have created an example of how to run GGA using the clasp solver, located in
the examples/ directory. You can either examine the example and copy it for
your own algorithm, or perform the following steps to run the example yourself:

** Note that this is just an example and only models several parameters of the clasp solver **

0. Build GGA using the instructions above.

1. Download the clasp 2.1.1 solver from the POTASSCO website:
1a. Build the clasp solver (or download an executable, your choice), and place the clasp executable in the examples/clasp directory.

2. Download the SAPS-SWGCP training instances from the ParamILS benchmark page:
2a. Extract the tar.gz into examples/clasp

3. Run GGA with the following command: (from the examples/clasp directory)

../../gga clasp.xml swgcp_seed_inst.txt -p 100 -g 100 --gf 75 --is 5 --ie 2000 --tacl 10 -v 5

XML parameter file overview

The XML file used to describe the parameters has the following structure:

    <cmd>./example_solver $instance $seed $param1 $param2 $param3 $param4 $param5</cmd>

        <variable name="root" value="0" />                                        
        <variable name="param1" value="3" />                                        
        <variable name="param2" value="c" />                                        
        <variable name="param3" value="5.654" />                                        
        <variable name="param4" value="12" />                                        
        <variable name="param5" value="cat2" />                                        

    <node type="and" name="root" start="0" end="0">
        <node type="and" name="param1" prefix="--param1=" start="0" end="4" />
        <node type="or" name="param2" prefix="--param2=" categories="a,b,c">
            <node type="and" name="param3" prefix="--param3=" start="5.5" end="7.5" />
            <node type="and" name="param4" prefix="--param4=" start="10" end="20" />
            <node type="and" name="param5" prefix="--param5=" categories="cat1,cat2" />
            <setting name="param1" value="0" />
            <setting name="param2" value="a" />
            <setting name="param4" value="15" />
            <setting name="param5" value="cat1" />

All of the specification of the parameter tree happens within the <algtune></algtune> tags.

The first part of the XML file (<cmd></cmd>) specifies the command line for GGA
to run. The variables $instance and $seed are special variables that tell GGA
where to put the path to the instance and where to put the seed. If you do not
need a seed for your solver, you can simply omit it. The rest of the parameters
($param1 - $param5) correspond to solver parameters to be tuned.

Next, the <seedgenome></seedgenome> tag specifies settings of the parameters to
insert into the initial population. Each <variable /> tag specified the name of
a parameter and the value for it to take in the genome. All parameters should
be specified in the seedgenome tag, even those that aren't used (such as the
tree root). Multiple seedgenomes are allowed, but there may not be more than
the initial size of the population.

The next portion of the file specifies the parameter tree itself. The tree is
specified by a single node corresponding to the root of the tree. All node tags
that are not a child of this root node will be ignored. Each <node> tag may
have any number of nodes underneath it. <node>s may have one of two types: "or"
or "and", which will be described in a moment. Each node is either categorical,
discrete or continuous. Categorical nodes have the "categories" attribute
specified (see param2 and param5). Discrete parameters have "start" and "end
specified with integer values (see param1 and param4). Floating point
parameters are specified like discrete parameters, except with floating point
numbers in start and end (see param3). An and node indicates that the child
nodes should all be present whenever the parent node is present in the
parameter settings of a genome. Prefix indicates what text to prefix to the
parameter in the command line, and is optional.

An or node means that only the node corresponding to the selected branch should
be included in the settings. In other words, or nodes allow users to select
between categorical parameters, and associate a branch of the parameter tree
with the parameter. This relationship is used within the optimization to find
good parameters. In the example above, param2 has three settings: a, b and c.
Setting a is associated with the branch of param3, setting b with param4 and c
with param5. Thus, if param1 - 5 take the following values: {3,c,5.654,12,cat2} 
as in the seeded genome, the following command would be executed: 

./example_solver (instance) (seed) --param1=0 --param2=c --param5=cat2

Note that param3 and param4 are not given on the command line because their
branches (param2=a / param2=b) were not selected.

The final part of the XML file specifies which parameter combinations are
forbidden to be used together. Each <forbid></forbid> block specifies a set of
parameters that may not be found together (i.e. an AND constraint). In the
above example, param1 may not be 0 at the same time that param2 is "a" (and
vice-versa); and param4 may not be 15 when param5 is cat1. Note that both
discrete and categorical parameters are supported. While it is allowed to
specify banned values of floating point parameters, this is not recommended as
floating point values are not represented exactly and it is unlikely that the
constraint will ever be hit. Note that constraints should not ban too many
settings of a combination of settings, as GGA's mechanism for finding feasible
configurations is extremely naive.

Common issues

- If the parameter tree is not reading in correctly, check that you don't have
  any mistakenly open <node> tags -- every <node> needs a </node>, or must be
  specified as <node />, when there are no child nodes.

- Check the output of GGA to make sure it is working correctly and passing your
  solver valid parameters! At verbosity level 5 (-v 5) GGA prints out every
  command that gets run so that you can copy/paste the command to see if there
  is any problem. There is quite a bit of output, but searching for GGAGenome
  in the output can help you find the run commands.


(Old) Changelog of GGA:

1.3.2 (May 23, 2014)
 - Fixing for log parameters (Thanks Marius Lindauer for the patch)
 - Fixing bug with trajectory names, causing them to only work for categorical parameters.

1.3.1 (May 8, 2014)
 - Adding "trajname" attribute to node elements in the XML file and updating to use this. This avoids output in the trajectory file
   caused by variable copies due to conditionals (e.g., varname_1, varname_2).
   Now, only "varname" will be output, and only the one that is implied by the
   OR-paths in the parameter tree.

1.3.0 (Apr. 4, 2014)
 - Allowing ParamILS/SMAC scenario files to be read in with the -scen parameter.
 - Forbidden parameter settings can be specified in the XML file (see above).
 - Reading in wrapper data during runtime tuning to see if the target algorithm
   crashed. If it did, it is given a timeout (* penalty) to avoid prefering bad
 - Fixed bug in GGAValue.cpp integer regex that was preventing negative values
   from being input correctly, resulting in lots of warnings and possibly even
 - The following minor changes in order to support the CSSC:
 -- Fixing output of $cutoff in the command line
 -- Changes to script to support categorical parameters with quotes
 -- Allowing maximum wall clock time to be specified (-twc <max wall clock seconds> or wallClockLimit in the scenario file)
 -- SIGUSR1 is no longer sent by default to the target algorithm (can be enabled with -su1 flag)
 -- Wallclock time is now reported in the trajectory file (along with user time)

1.2.4 (Jul. 31, 2013)
 - Fixing bug in for handling conditionals. Thanks to Frank Hutter to pointing out this bug on clasp-cssc.

1.2.3 (Jul. 24, 2013)
 - Trajectory output for Yuri
 - Fixing -tc parameter in help message

1.2.2 (Jul. 3, 2013)
 - Thanks to Yuri Malitsky for pointing out the following two bugs.
 - Removing -r from help file because it was apparently never implemented.
 - Instances were not being properly read in when seeds were present (regression from v1.2.0)

1.2.1 (Jun. 29, 2013)
 - Outputting dummy variables in seeded genome from paramils_convert script in
   order to prevent GGAValue segfault.
 - Adding basic trajectory file output (-traj <file path>). Note that this only
   outputs the CPU time and the parameters of the solution.

1.2.0 (Jun. 12, 2013)
 - Adding ParamILS/SMAC PCS to GGA XML conversion script.
 - Allowing GGA to accept instance files that omit the seeds.

1.1.0 (Apr. 5, 2013)
 - Implemented -me and -c, which apparently had not been implemented yet.

1.0.1 (Feb. 4, 2013)
 - Initial public release of GGA