Feeds:
Posts

## An Incomplete Review of Sudoku Solver Implementations

### Introduction

I conducted a programming language benchmark a couple of months ago. One of the major concerns is that it is a microbenchmark that rarely measures the real-world performance. When I saw Google’s recent publication (PDF) on the real-world comparison of C++, Java, Scala and Go, I was wondering if I could consider to adopt these problems in my benchmark. Nonetheless, I quickly gave up because of its complexity and because I am not quite interested in the problem itself. Is there an interesting problem that is easy to implement and informative for measuring the performance programming languages? Sudoku.

Sudoku is probably the most popular puzzle nowadays. Lots of attempts have been made to solve Sudoku using programs. However, solving Sudoku is known to be an NP-complete problem, which means no efficient algorithm exists so far. Finding the solution largely relies on black magic. On the other hand, lacking a definite answer makes Sudoku more interesting. A variety of methods arise to solve Sudoku. Wiki gives a good brief view of the existing algorithms. I will focus more on the practical implementations.

### A brief overview of Sudoku solving algorithms

Solving Sudoku puzzles can be reduced to a graph coloring problem, or more often an exact cover problem that can be solved with Knuth’s elegant Dancing Links algorithm (DLX). There are DLX-based solvers in C, C++, Java, C#, Python, Ruby and Javascript. Google search will give you many more. These solvers are usually fast, but for the few I have tried, they are not the fastest. To the best of my knowledge, the fastest implementations for each programming language are all based on generic backtracking without using DLX. The basic idea behind these fast solvers is simple: we select the cell or the constraint with the minimum choices, fix to one number and move forward; when a cell is left with no choice, we go back to the previous step and select the next allowed number. The key to fast solving is to quickly find cells that can be determined with simple logic, including at least the naked single and hidden single rules. Fast solvers may employ a little more advanced logic (e.g. locked candidates) if that can be computed fast. The rest is data structures and details which vary a lot and also matter a lot. It is the great variability in algorithms and implementations that makes Sudoku fascinating.

### Benchmarking several Sudoku solvers

To demonstrate how fast a Sudoku can be solved and how the solving speed may be affected by implementations, I briefly evaluated the performance of a few solvers using 20 very hard Sudoku puzzles repeated 50 times (1000=20×50 Sudokus in total). Some of these solvers are casual implementations and not to be perform well. I just choose a few to show the impact of algorithms and implementations. The following table shows the result of solving these 1000 very hard Sudokus.

Solver Algorithm Test Unique? Language Time (sec)
JSolve backtracking Yes C 0.25
kudoku backtracking Yes C 1.1
fast_solv_9r2 DLX Yes C 1.4
kudoku backtracking Yes Java 2.0
kudoku backtracking Yes JavaScript 6.3
kudoku backtracking Yes Lua 7.5
sudoku-bb Backtrack No Python 33.5
sudoku-gh DLX Yes JavaScript 41.2
Peter Norvig’s backtracking No Python 147.4
kudoku backtracking Yes Python 190.5
sudoku-pk backtracking No Javascript 278.4
sudoku-aa DLX Yes Python 514.0
sudoku_db backtracking No Python 1915
Sudoku_bc backtracking Yes Java 2180
Sudoku_dl unknown No Java 2975
Sudoku_6l brute-force No Java 25385

The third column indicates whether the solver finds the first solution only or confirms the input Sudoku having a unique solution. Finding all solutions is typically 50-100% slower than stopping at the first solution.

### Discussions

So far as I know, JSolve and fsss are the fastest Sudoku solvers to date.
JSolve based on BB_sudoku, Brian’s famous Bit Based Sudoku Solver. These three solvers are all based on generic backtracking instead of DLX and all introduce advanced tactics in addition to the basic naked/hidden single rules.

Kudoku is my solver. The basic idea comes from Guenter Stertenbrink’s suexk with some improvements. Kudoku is first implemented in C and then ported to a few other programming languages. It is not the fastest solver, but much simpler than JSolve and fsss.

The speed difference between the four Java implementations is striking. My solver is 1000-10000 times faster than the other three solvers. This proves how important the algorithm is to solving Sudoku puzzles. Choosing the right algorithm can make a huge difference.

Peter Norvig has also implemented a nice solver in Python, which has been ported to Ruby and Javascript among a few others. “sudoku-pk.js” is one of the Javascript ports. Note that Peter’s solver stops at the first solution (if I am right), while Kudoku finds all solutions. If I make Kudoku find one solution only, it takes 100 seconds, faster than other Python implementations.

EDIT: After I published this post, Boris Borcic has kindly sent me his Python solver (sudoku-bb in the table above), which is easily the fastest among the Python solvers I know. This solver is extremely interesting for two reasons. Firstly, by algorithm, Boris’ solver is not much different from my solver, both based on the exact-cover matrix without dancing. The major difference is sudoku-bb uses Python native functions more while my version, which is literally translated from C, heavily relies on deep nested loops (because I do not know python well). Although my version has smaller time complexity (I believe) and is ~50% faster with Pypy, sudoku-bb is 3X as fast with CPython. This example shows how important it is to make the best use of native routines.

Secondly and more importantly, Boris’ solver relies on a very interesting property of Sudoku: if the Sudoku has a unique solution, using “naked singles” and “hidden singles” rules guarantees that at each step, there are at most two choices. Neither he nor I cannot prove this, but it is true for ~100,000 hard Sudoku puzzles. I do not know if this is a known conclusion (we should ask mathematicians). Using this property can simplify coding, but disables unique solution checking.

### Conclusions

• The fastest Sudoku solver can solve even the hardest Sudoku in about 1 millisecond and solve most others in 0.1 millisecond.
• An inefficient algorithm can be 3 to 4 order of magnitude slower than an efficient one, so choose the algorithm carefully.
• I have added Sudoku solving to my programming language benchmark as a performance measurement for a real-world application (incomplete right now). The Javascript solver is also available online. You may get a sense how fast it is to solve Sudoku puzzles, unless you are using IE 6/7/8 which are known to have crappy Javascript engines.

### 26 Responses

1. on June 22, 2011 at 5:12 pm | Reply Boris Borcic

Hello,

I stumble onto your blog, and I thought it’s an occasion to pass on a pure python 2.5 sudoku solver (pure in the sense that the solver doesn’t import any module), that I wrote back in ’06 is about 35 slocs and is quite fast (it does your 1000 solutions test in less than 21 secs). I couldn’t find an e-mail adress, though, so I include it here in between code tags counting on the blog platform not to mangle the code. At the price of doubling code size, I managed to gain about 10% in speed. As for implementing Knuth’s dlx in python, it only makes sense if you use a code acceleration like psyco and pypy. At the same period I wrote a dlx implementation that ran just slightly faster (than the attached code) but only when accelerated using psyco.

Cheers, Boris Borcic

Attractive Chaos Edit: I have moved Boris’ source code here under his consent.

• Chaos mentions that you use a “two-choice maximum” rule to speed up backtracking, but that this means you can’t verify unique solutions.

Does this also mean that if you feed in a puzzle with multiple solutions, your program isn’t guaranteed to find one?

• on July 4, 2011 at 9:01 am Boris Borcic

This means that the solver can’t reach a solution that is not attainable by complete propagation of constraints alternating with the non-deterministic choice of one possibility among two from a cell that at the time of choice only admits two possibilities. In particular, the solver can obviously be made to fail by feeding it a sufficiently underdetermined problem easily solvable with brute force (for instance the completely undetermined problem). Otoh, this solver quickly found a solution to any from the thousand problems from public problem lists on the internet.

It is also written not to look for further solutions once it meets one, but it is a minor modification to make it completely explore the search space subject to the restriction to 2-possibilities choice points. I guess the easiest to do, to learn more on the matter, would be to offer a bounty for a problem that has (possibly many solutions but) a -single- solution -not- attainable by such a solver.

To keep to the terms of your question, no, it can’t offer the required guarantee, but further than relatively trivial problems cases for which it will obviously fail, its failure domain is ill understood and seems in practice not to intersect with real sudoku problems.

• on July 5, 2011 at 12:41 am Boris Borcic

2-choicepoints are actually many enough in difficult problems that it makes sense for the algorithm to become picky on which to use. I did in fact experiment with tuning a quickly computed decision heuristic on whether to keep some 2-choicepoint as “good enough” or go looking for a better one to use. Even though the slowness of Python doesn’t allow much sophistication on that road, this allowed me to shave a good 10% running time over my testing corpus.

2. on June 22, 2011 at 5:28 pm | Reply Boris Borcic

My expectations concerning the use of ` tags weren't met, obviously - although they were deleted :( I'll send the code to you by email reply - mine is bborcic [at] gmail`

``` ```

`Cheers, Boris Borcic`

3. Hello

I’m author of Fast_Solv_9r2, witch (Sudoku Fast solver version 9 revision2)
Just to point Knuth’s DLX still is a backtracking algorithm. What it makes special is the way its decided when to backtrack.

By the time I make that code it was fast, maybe fastest solving general sudoku problems. Several faster programs appeared and finaly come Brian Turner with his BB_Sudoku and record solving time. Uses Logic first, then backtracks.

Cant say if solving sudoku is the best way to benchmark languages, but for sure you should think to make bigger sudoku problems sets, in the way times are higher and better measured. Current hardware is very fast!

Ignacio.

• on June 29, 2011 at 2:43 am | Reply attractivechaos

If I used a large data set, the slowest implementation would take days.

In my understanding, most naive algorithms only use the “naked singles” rule. Better algorithms use the “hidden singles” rule, which is a little bit harder to code. With the exact-cover binary matrix, the two rules are naturally applied. BB_Sudoku uses the “locked candidate” rule in addition to naked/hidden singles rules. It is able to use other rules as well, but not by default, probably because considering complex logic involving multiple cells takes more time than simply backtracking. JSolve optimizes BB_Sudoku a little bit and thus faster. I do not know the algorithm behind fsss.

Similar to suexco, my algorithm takes the exact-cover binary matrix but without dancing. I know DLX is a special way of doing backtracking. Wiki is very clear about this.

4. on August 8, 2011 at 7:38 pm | Reply Andreas Scherer

I’m sure, you’re aware of DEK’s qnd hack of ‘sudoku.w’ (CWEB, http://www.cs-faculty.stanford.edu/~knuth/programs/sudoku.w). In combination with DEK’s ‘dance.w’ (CWEB, http://www-cs-faculty.stanford.edu/~knuth/programs/dance.w), it can be used in a Unixish pipe to solve sudokus: echo ‘…..5.74 [...] 36.5…..’ | sudoku | dance 1. The resulting output is not easy to read at first glance, but by hacking some code in AWK a/o Ruby, it can transformed into human-readable form.

5. [...] de un tiempo de ejecución de segundos a otro de años. Por ejemplo, aquí podéis encontrar una comparativa entre diferentes programas para resolver Sudokus (con tiempos de 0.25 segundos a 7 [...]

6. I also have a sudoku solver here: http://www.vantasyworld.com/fun/sudoku/sudokusolver.html

How well does it compare to the others?

Can the above detect unsolvable sudokus? Mines should!

7. [...] real time! The Sudoku solver the authors used was Brian Turner’s open source solver. (Here is a review of solvers.) Very impressive… [...]

8. A easy way to speed the Boris Borcic program is to pass the globals into search so Python spends less time having to look them up, that saves about 2.5% runtime on my PC. The Peter Norvig program can be speeded by removing the duplicated calls to len(values[s]) in the eliminate function by storing the return, that’s about a 5% saving.

9. on February 1, 2012 at 9:45 pm | Reply Félix Cloutier

I had to program a sudoku solver for a class. I found your entry pretty late–my work is pretty much done, and I’m not starting over, even with the Kudoku approach, which is arguably better than mine.

To test my implementation, I generated about 12,000 puzzles. (It turns out that over these, Kudoku is roughly 1.7 times faster than my program. Close enough.) Much to my surprise, though, the generator came up with that grid:

6….894.9….61…7..4….2..61……….2…89..2…….6…5…….3.8….16..

It takes about 21 ms to solve with Kudoku on my computer, which is by far the hardest grid I could find for it. It’s roughly 3 times longer to solve than the longest puzzle in the ‘hardest puzzles’ list.

So I thought I’d share it. I’m not creative enough to give it a name.

• on February 1, 2012 at 9:57 pm | Reply attractivechaos

Thanks a lot. Actually my calling those 20 as “hardest” is not quite right. The “hardness” is with respect to the algorithm. A sudoku hard for one solver may be easy for others. Thanks again.

10. I have written an article about solving and generating Sudoku using backtracking and Human-like aproach. Backtracking is implemented in Java and C++, human-like approach in Java. Also several popular solving strategies are described there.

http://en.algoritmy.net/article/42521/Sudoku

11. What CPU did you use to run the tests?

12. [...] of Sudoku’s at the same time for some proper bench-marking, something like what was done here, but for now this screenshot will have to do! This same Sudoku took around 0.433 seconds the last [...]

13. [...] be seen below, I found an interesting difficult set of Sudoku’s on another blog (thanks to attractivechaos)  that I’ve been testing with, and modified the input system of the application so that it [...]

14. [...] there have been some very fast Sudoku-solving algorithms produced, a basic backtracking algorithm implemented efficiently will be hard to beat. Even a [...]

15. on January 9, 2013 at 2:02 am | Reply Michael Kragh Pedersen

I just tried out your Java implementation of kudoku, and ran it against 2365 hard puzzles. It solved them all in 1,5 seconds, 0,7 if I didn’t print the solutions. It even took less than a second to print out about 6300 solutions for one of those puzzles from which I removed a couple of cells.

I have to say that I am surprised. I was expecting DLX to be faster than backtracking, since backtracking revolves around trial-and-error, and could possibly go through hundreds of millions of choices before finding the right solution.

But you have obviously done your job very well with this, and I tip my imaginary hat to you, good sir.

16. [...] I decided to also measure performance, using the same approach as attractivechaos in the blog post An incomplete review of Sudoku Solver Implementations. The author uses 20 really hard Sudokus that are solved repeatedly 50 [...]

17. I wrote a fun sudoku solver, and it is quite fast. Why don’t you try it out against your own solution? http://www.codeotaku.com/journal/2011-05/solving-sudoku-efficiently/index

18. I recently have written a python Sudoku solver, which I intend to make it run fast. For the 20 hard puzzles, it is still about 1/3 slower than Norvig’s algorithm; but on sets of simpler puzzles (such as those 40000+ 17-clues), it is about 1/3 faster.

19. Hi, I am the author of fsss. Some of the tricks I used are inspired from Brian’s BB Sudoku. I focused on tasks different than just solving and optimized the solver for multithreading/parallelism, earlier determination of invalid puzzles, etc. For most of the solving modes even the solved cells’ content isn’t recorded. More recent upgrades are available in my more general sudoku tool “gridchecker” – it is free and periodically I am publishing its source code.
To my knowledge the fastest solver for the moment is ZSolve (http://forum.enjoysudoku.com/post225783.html#p225783). It is about 2 times faster than fsss.