# pplp / blog / blog.html

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91``` ```

Playing with Linear Programming on PyPy

Fancy hi-level interfaces often come with a high runtime overhead making them slow. Here is an experiment with building such an interface using constructions that PyPy should be good at optimizing. The idea is to allow the JIT in PyPy to remove the overhead introduced by using a fancy high-level python interface on top of a low-level C interface. The application considered is Linear programming. It is a tool used to solve linear optimization problems. It can for example be used to find the nonnegative values x, y and z that gives the maximum value of

without violating the constraints

There exists general purpose solvers for these kind of problems that are very fast and can literally handle millions of variables. To use them however the problem has to be transformed into some specific matrix form, and the coefficients of all the matrices has to be passed to the solver using some API. This transformation is a tedious and error prone step that forces you to work with matrix indexes instead of readable variable names. Also it makes maintaining an implementation hard since any modification has to be transformed too.

The example above comes from the manual of the glpk library. That manual continues by describing how to convert this problem into the standard form of glpk (which involves introducing three new variables) and then gives the c-code needed to call the library. Relating that c-code to the problem above without the intermediate explanation of the manual is not easy. A common solution here is to build a hi-level interface that allows a more natural way of defining the matrices and/or allow the equations to be entered symbolically. Unfortunately, such interfaces often become slow. For the benchmark below for example, cvxopt requires 20 minutes to setup a problem that takes 9.43 seconds to solve (this seems a bit extreme, am I doing something wrong?).

The high-level interface I constructed on top of the glpk library is pplp and it allows the equations to be entered symbolically. The above problem can be solved using

lp = LinearProgram()     x, y, z = lp.IntVar(), lp.IntVar(), lp.IntVar()     lp.objective = 10*x + 6*y + 4*z     lp.add_constraint( x + y + z <= 100 )     lp.add_constraint( 10*x + 4*y + 5*z <= 600 )     lp.add_constraint( 2*x + 2*y + 6*z <= 300 )     lp.add_constraint( x >= 0 )     lp.add_constraint( y >= 0 )     lp.add_constraint( z >= 0 )      maxval = lp.maximize()     print maxval     print x.value, y.value, z.value

To benchmark the API I used it to solve a minimum-cost flow problem with 154072 nodes and 390334 arcs. The C library needs 9.43 s to solve this and the pplp interface adds another 5.89 s under PyPy and 28.17 s under CPython. A large amount of time is still spend setting up the problem, but it's a significant improvement over the 20 minutes required on CPython by cvxopt. It is probably not designed to be fast on this kind of benchmark. I have not been able to get cvxopt to work under PyPy. The benchmark used is available here ```