+title: 'Choco and Regular Expressions'

+tags: andrew choco clojure

+A quick update to the series on [Choco][choco] and pipes last week.

+An alternative way to specify the neighbour constraints is to use regular

+expressions. That may seem a little odd, but much of the more advanced

+constraints in Choco are based around DFAs (Discrete Finite Automata).

+DFAs are the "science" behind regular expressions - they are what a regular

+expression is compiled to (at least in theory; in practice many regular

+expressions are not strictly regular, and [many regex implementations are

+Well, a DFA is just a set of rules (transitions) associated with particular

+states. You start at the start state and look at the first character; the

+rules for the start state tell you which to state to move to before you look

+at the next character. And so it repeats, until you have either matched all

+characters, or there is no suitable rule.

+That's a pretty powerful and general approach, so building it into Choco makes

+for a flexible constraint engine. At least in theory. In practice, I found

+it a little confusing at first, but I'm starting to understand and have just

+got a simple test case working.

+The basic idea is the values associated with variables are the "characters" in

+the regular expression (for integer values you really can use a regex -

+something like `"123"` means values 1, 2 and 3 for three variables, while

+`"<10><11>"` is the syntax for 10 and 11 - but you can also construct the DFA

+directly, and there are a bunch of simpler API methods, like the one I'll use

+below). Obviously, this means that along with the regular expression you need

+to specify which variables it applies to (and their order, which is implicit

+in the array used in the API).

+Back to my simple test case. If you read the [earlier article][article-5] you

+may recall that we needed to constrain neighbouring tiles, so that pipes

+connect correctly. I did that by defining binary relations for neighbouring

+An alternative way to provide this constraint is to say that two neighbouring

+cells must have values that satisfy a regular expression, where the regular

+expression is simply the list of acceptable pairs. And Choco provides a

+simple API for this: `Choco.regular(IntegerVariable[], List<int[]>)`.

+Here's the new code, which replaces the earlier `constrain-pairs`:

+(defn constrain-pairs-regular [model n lo hi masks relation? variables swap]

+ (for [m1 masks, m2 masks :when (relation? m1 m2)] [m1 m2])))]

+ (map #(map variables %)

+ (for [i (range (dec n)), j (range n)]

+ (map swap [[i j] [(inc i) j]])))]

+ (Choco/regular (into-array IntegerVariable [v1 v2]) tuples)))))

+It's almost identical to the original (a little simpler), but appears to

+trigger quite different behaviour in the engine. Unfortunately the new

+behaviour is not an improvement - the benchmarks are 2-3 times slower

+(presumably because the DFA machinery has more overhead than the simple binary

+Still, it's an extra weapon to add to the armoury for more complex programs in

+As ever, latest code is in [the repo][pipes_repo].

+[cox-regex]: http://swtch.com/~rsc/regexp/regexp1.html

+[article-5]: http://isti.bitbucket.org/2012/04/07/pipes-clojure-choco-5.html

+[pipes_repo]: https://bitbucket.org/isti/clojure-pipes

+[choco]: http://www.emn.fr/z-info/choco-solver/