# Solving Sudoku with L1-norm minimization

I came across an interesting paper by Babu, Pelckmans, Stoica the other day, via my favourite Compressed Sensing blog Nuit Blanche.

The idea is that it is possible to formulate Sudoku as a linear system, with the board as a 9x9x9 indicator vector and the rules as a matrix encoding the different Sudoku constraints. The solution to this system with the minimal L1-norm will often be an indicator vector as well – and will represent the solution to the puzzle with the missing entries completed.

To play around with the ideas here I re-implemented the paper in Python, using CVXOPT. I’m going to try and explain all this coherently at the December London Python Dojo meetup

Update: I’ve now updated the code with the help of the much-easier-to-use CVXMOD wrapper library, and put it on GitHub here if you want to play with it.

The original authors’ Matlab codes are available as well.

### 13 responses to “Solving Sudoku with L1-norm minimization”

1. Tom Viner

Hi Ben, great talk last night.

I’m looking through your slides to see if it would be possible to specifying the extra rules to solve a Killer Sudoku http://www.killersudokuonline.com/

Then need a structure to enter the killer constraints, something like this for a 3×3:

“””
AAC
BAC
BCC
“””
ksum[‘A’] = 4
ksum[‘B’] = 5
ksum[‘C’] = 9

• transfinite

Nice talking to you too! In fact it definitely is possible to solve Killers in this way – have a look in github at the freshly added KillerProblem class which does just what you suggest. (In fact I also scaled it down so the answer is still just a column of ones, but now the matrix with the problem has fractions rather than just 1s and 0s.) I’ve only tried it with a Times Gentle problem but it got it on the first go.

2. Alejandro

Hi:

I am running your code as follow:
——————————-
import sudoku
p = sudoku.sample()
p.solve()
——————————-

And get the following error:

Traceback (most recent call last):
File “”, line 4, in
File “sudoku.py”, line 362, in solve
v = solvefunc(M,ones)
File “sudoku.py”, line 441, in solve_rw_l1_cvxmod
p.solve(quiet=True, cvxoptsolver=’glpk’)
TypeError: solve() got an unexpected keyword argument ‘cvxoptsolver’

I am using CVXMOD version 0.4.5, and Python 2.6.5 in an Ubuntu 10.4 machine. I tried replacing line 441 by

p.solve(quiet=True)

And now I get this error:

Traceback (most recent call last):
File “”, line 4, in
File “sudoku.py”, line 362, in solve
v = solvefunc(M,ones)
File “sudoku.py”, line 442, in solve_rw_l1_cvxmod
p.solve(quiet=True)
File “/usr/local/lib/python2.6/dist-packages/cvxmod/base.py”, line 2465, in solve
res = cvxopt.solvers.lp(c, G, h, A, b)
File “/usr/lib/python2.6/dist-packages/cvxopt/coneprog.py”, line 3019, in lp
dualstart)
File “/usr/lib/python2.6/dist-packages/cvxopt/coneprog.py”, line 685, in conelp
raise ValueError, “Rank(A) < p or Rank([G; A]) < n"
ValueError: Rank(A) < p or Rank([G; A]) < n

Any idea about how to solve this?

Regards,
Alejandro.

3. Alejandro

It seems that your functions that call the CVXMOD solver have some problems.

I tried with

solution = p.solve(solvefunc = sudoku.solve_plain_l1)

and it works.

• transfinite

Alejandro,

Ah – an oversight, I had made a change to CVXMOD so it can tell cvxopt to run with glpk. Thanks for noticing!

If you amend solve in cvxmod/base.py by adding a cvxoptsolver=None argument, and then pass the value of cvxoptsolver as the solver= argument of the cvxopt.solvers.lp call on line 2465, you should be able to run the CVXMOD code as well.

I contacted Jacob Mattingley, the author of CVXMOD about this. Apparently he already had the same change in his dev repository but isn’t intending to make a maintenance release at the moment.

Let me know how you get on,

Ben

4. Alejandro

Ben,

I checked the way you translate the L1 minimization problem into an LP problem in the function solve_plain_l1, and I think it is not completely right. The problem is that you are constraining x to be positive. This is not a problem for the Sudoku case, because in fact x is greater than zero, but in general, this is not the case.

I think the following function correctly cast the L1 minimization problem as an LP problem. At least the dimension of the problem is smaller, and produce the same results.

def solve_plain_l1(A, y):
m = A.size[0]
n = A.size[1]
c = concatvert(zeros_v(n), ones_v(n))
I = eye(n)
G1 = concathoriz( I, -I)
G2 = concathoriz( -I,-I)
G = concatvert(G1, G2)
h = zeros_v(2*n)
Aa = concathoriz(A, matrix(0, (m,n), ‘d’))

u = cvxopt.solvers.lp(c, G, h, Aa, y, solver=’glpk’)

v = u[‘x’][:n]
return v

5. Ben,

Would you have by any chance the original matlab file written by the authors of the paper ?

Cheers,

Igor.

• transfinite

I don’t have the original Matlab code, it is worth trying the authors of the paper though.

6. ricardo

Could someone explain very simply what L1-norm minimization is? I know L2-norm minimization is the least square error solution.

7. transfinite

Ricardo,
Short answer: The $L^1$ setting is just minimizing the sum of the absolute errors, instead of the (square root of the) sum of the squared errors.

Using the “$L^2$ norm” means measuring the length of a vector with the standard Euclidean distance, the square root of the sum of the squares of the components: $\parallel\mathbf{x}\parallel_{2} = \sqrt{\sum_{i} \vert x_i \vert^2}$.
But we can choose to replace the square and square root with other powers than 2, say
$p$ to define an $L^p$ norm instead: $\parallel\mathbf{x}\parallel_{p} = \left(\sum_{i} \vert x_i \vert ^{p}\right)^{1/p}$ .

Take $p=1$ and you get $L^1$, $\parallel\mathbf{x}\parallel_{1} = \sum_{i} \vert x_i \vert$. This is just the sum of the absolute values.

If you are trying to approximate a vector $\mathbf{y}$ with another vector $\mathbf{x}$, you need a way to measure your goodness of fit, and you can do this by looking at the norm of the residual error. If you use $L^2$, then you minimize $\parallel\mathbf{y} - \mathbf{x} \parallel_{2}$, which is equivalent to the least squares solution as you rightly say.

It’s equally valid to use another norm instead though, and this might lead you to a different solution. In particular, the solutions that result from minimizing $L^1$ are generally sparser than that from $L^2$. That is, if some amount of error is unavoidable, these solutions are more likely to have a few larger error components and many zeros, as opposed to having many smaller error components. If you know a priori that your problem structure is such that this is likely, then you have a reason to use $L^1$ over $L^2$.

The other big advantage of $L^1$ is that the resulting minimizations are linear programming problems, which can be solved efficiently.

8. ricardo

Many thanks transfinite. I’ve been trying to find “L1-norm for dummies” for several days. Yours is clear and useful to someone hu kunt reed rite & kont.
I’m no mathman .. but I’m not sure linear programming is more efficient. If the system is linear, you can just set up the eqns and use Singular Value Decompostion to solve for L2-norm. Very elegant.
Linear programming is messy.
For non-linear systems where you must use an iterative method, |absolute| values & limits are nasty and can lead to limit cycles and loadsa local minima.
Maybe I’m just a lousy programmer!

9. transfinite

Ricardo
It’s true that linear programming isn’t more efficient to solve than least squares. But if you really want a sparse solution with only a few non-zero errors, $L^1$ is special because it’s the best approximation you can get that still leaves you with a convex optimization problem. That means you /are/ guaranteed to avoid local minima. If on the other hand you explicitly ask for the $L^0$ solution which is exactly the solution with the fewest nonzeros, then you’re into NP-hard territory and you have the problems you mention.

That’s really why $L^1$ is special: it’s an optimal trade-off between having a sparse solution, and an efficiently solvable problem.