The Minimum Number of Givens Joshua Cooper USC Department of Mathematics

The Minimum Number of Givens
in a Fair Sudoku Puzzle (is 17!)
Joshua Cooper
USC Department of Mathematics
Rules: Place the numbers 1 through 9 in the 81 boxes, but do not let any number
appear twice in any row, column, or 3 3 “box”.
You start with a subset of the cells labeled, and try to finish it.
1
6
5
4
3
9
7
2
8
7
8
3
2
6
1
4
5
9
9
2
4
8
5
7
6
1
3
4
7
9
5
2
8
1
3
6
3
1
2
9
4
6
5
8
7
6
5
8
7
1
3
2
9
4
5
9
6
1
8
4
3
7
2
2
3
7
6
9
5
8
4
1
8
4
1
3
7
2
9
6
5
A Sudoku puzzle designer has two main tasks:
1. Come up with a board to use as the solution state.
2. Designate some subset of the board’s squares as the initially exposed
numbers (“givens”).
For example:
BOX
ROW
CELL
1
7
9
4
3
6
5
2
8
COLUMN
6
8
2
7
1
5
9
3
4
5
3
4
9
2
8
6
7
1
4
2
8
5
9
7
1
6
3
3
6
5
2
4
1
8
9
7
9
1
7
8
6
3
4
5
2
BOARD
7
4
6
1
5
2
3
8
9
2
5
1
3
8
9
7
4
6
8
9
3
6
7
4
2
1
5
STACK
1
7
6
8
3 9 7
4
8 5
9
2 8 1
8 7 1
2
8 4
7
1 3 7
PUZZLE
We’re going to focus on task #1: How to choose a “fair” Sudoku board?
8
BAND
6
GIVEN
1
5
For a Sudoku puzzle, i.e., a set of givens, to be “fair”, it must have two properties:
1. It has a solution. (Solvability)
2. There is only one solution. (Uniqueness)
Question: What is the fewest number of givens in a fair puzzle?
Possible solution (“Brute Force”):
1. Enumerate all possible sets of givens.
2. Check each one to see if it is solvable.
3. Check the solvable ones to see if they are unique.
4. Count up the number of givens in the smallest uniquely solvable
puzzle, and output the minimum such number.
Why Brute Force Is Impractical:
1. Enumerate all possible sets of givens.
With 81 cells, there are 281 ≈ 2.4 ∙ 1024 sets of cells one could fill in.
Actually, the situation is even worse, because we have 9 options for the contents of
each cell. That means a total number
81
81
81
81
1 + 9 ∙ 81 + 92 ∙ ( 2 ) + 93 ∙ ( 3 ) + … + 980 ∙ (80 ) + 981 ∙ (81 )
of possible sets of givens.
“81 choose 3” = the number of
ways to choose 3 objects from
a collection of 81

81!
81  80  79

 1080.
78! 3!
3  2 1
Why Brute Force Is Impractical:
1. Enumerate all possible sets of givens.
With 81 cells, there are 281 ≈ 2.4 ∙ 1024 sets of cells one could fill in.
Actually, the situation is even worse, because we have 9 options for the contents of
each cell. That means a total number
81
81
81
81
1 + 9 ∙ 81 + 92 ∙ ( 2 ) + 93 ∙ ( 3 ) + … + 980 ∙ (80 ) + 981 ∙ (81 )
of possible sets of givens.
“N choose K” = the number of
ways to choose K objects from
a collection of N

N!
 BIG.
( N  K )! K !
Why Brute Force Is Impractical:
1. Enumerate all possible sets of givens…
With 81 cells, there are 281 ≈ 2.4 ∙ 1024 sets of cells one could fill in.
Actually, the situation is much worse, because we have 9 options for the contents of
each cell. That means a total number
81
81
81 ) + 981 ∙ (81 )
1 + 9 ∙ 81 + 92 ∙ ( 2 ) + 93 ∙ ( 3 ) + … + 980 ∙ (80
81
of possible sets of givens.
By the Binomial Theorem,
 81
81
81


9

(
1

9
)

10
,

 j
j 0
 
81
j
which is approximately the number of atoms in the observable universe.
Let’s be a little smarter about this…
1. Enumerate all sets of 81 givens, and if a uniquely satisfiable puzzle is
found, enumerate all sets of 80 givens, and if a uniquely satisfiable
puzzle is found, enumerate all sets of 79 givens…
In fact, we can start much lower than 81, since there are many uniquely satisfiable
puzzles known with fewer than 81 givens.
Indeed, there are uniquely satisfiable puzzles known which have only 17 givens.
1
4
2
5
8
1
3
5
4
3
7
9
4
1
8
2
6
Gordon Royle has compiled a list of 49151 (!) inequivalent ones at:
http://mapleta.maths.uwa.edu.au/~gordon/sudokumin.php
What does it mean for two Sudoku boards/puzzles to be equivalent?
Two boards are considered equivalent if it is possible to transform one into the other
by a sequence of operations of the form:
1. Permuting the rows and
columns of each band/stack (X 3!6)
2. Permuting bands I, II, and III, and
I
and stacks A, B, and C (X 3!2)
3. Permuting the numbers/colors (X 9!)
II
This generates a group of 3,359,232
different possible operations.
III
We’ll call this the “Sudoku group.”
A
B
C
So, start with 16 givens:
1. Enumerate all sets of 16 givens…
How many such sets are there?
81
916 ∙ (16) ≈ 6.22 ∙ 1031
It would be silly to look at all of these, though:
1. We can rule out anything that has two of the same symbol in any
column, row, or box.
2. Once we examine one, we don’t have to look at all the ones equivalent
to it.
Approximate total number of inequivalent configurations of 16 “non-conflicting” givens:
3.64 × 1023
Still way too big.
Even if we could enumerate all of these, and even if we knew how to generate a list of
one representative of each equivalence class (= orbit under the Sudoku group)…
2. Check each one to see if it is solvable.
3. Check the solvable ones to see if they are unique.
} Use backtracking.
NEWS FLASH!!!
January 1, 2012: McGuire, Tugemann, Civario, University College Dublin
There is no 16-Clue Sudoku: Solving the
Sudoku Minimum Number of Clues Problem
Posted on the arXiv, so it has not been published (i.e., vetted by a referee).
Nonetheless, it looks legit.
Q: How the *$?&!* did they do that!?
A: Some clever mathematics, some very clever programming, and a RIDICULOUS
amount of computing power:
7.1 million core hours on an SGI Altix ICE 8200EX cluster with 320
compute nodes, each of which has two Intel (Westmere) Xeon E5650
hex-core processors and 24GB of RAM = approx 1 year real time
The general strategy:
1. Construct a catalogue of all 5,472,730,538 inequivalent boards.
Done by Glenn Fowler, AT&T labs. Full enumeration, with a very
clever and specialized compression algorithm.
Uncompressed data size: 418 GB.
Compressed data size: 6 GB. (That’s 8.77 bits/board!)
2. Search each board for sub-puzzles with 16 givens, and check
each one to see if it can be uniquely completed to a valid Sudoku
board.
BIG PROBLEM:
number of seconds since
 81
16
   3.4 10 
life began on Earth
16 
So, McGuire et al were smarter about which sets of cells they looked at.
Observation:
Every fair puzzle must contain
at least one of the red numbers.
Call such a set of cells
“unavoidable”.
937856241
562194387
481273569
823647915
615932478
749581623
378469152
196725834
254318796
Observation:
Every fair puzzle must contain
at least one of the red numbers.
Call such a set of cells
“unavoidable”.
Smarter strategy for searching for
16 cell puzzles:
937856241
562194387
481273569
823647915
615932478
749581623
378469152
196725834
254318796
1. For each completed board, find lots of unavoidable sets.
2. Enumerate all the sets of 16 cells that hit each unavoidable
set at least once.
3. Check each set of 16 cells to see if it is a fair puzzle.
1. For each completed board, find lots of unavoidable sets.
Strategy: Ed Russell compiled a list of 525 “blueprints” (which includes all
of them on 11 or fewer cells).
Apply the Sudoku group to these blueprints to obtain a large collection of
them, and then compare to each puzzle in turn.
Example blueprint:
1
4
2
3 1 2
3 4
3 4
2 1
2. Enumerate all the sets of 16 cells that hit each unavoidable
set at least once.
This is the so-called “hitting set” problem, well known to be NP-hard.
Definition. Given a collection of subsets of clues (the unavoidable sets), a
hitting set (or transversal) for this collection is a set of clues that intersects
every one of the subsets.
Algorithm:
1. At each step, find the smallest unavoidable set that does not contain any
of the clues picked so far, and then try each element of this unavoidable
set as the next clue.
2. Repeat until 16 clues have been chosen.
3. If the collection of unavoidable sets is exhausted before we get to the
16th clue, simply add the remaining clues needed in all possible ways.
Small but crucial improvement: whenever we add a clue to the hitting set
from an unavoidable set, we consider all smaller clues from that unavoidable
set as dead, i.e., we exclude these smaller clues from the search (in the
respective branch of the search tree only).
3. Check each set of 16 cells to see if it is a fair puzzle.
McGuire et al used an open-source Sudoku solver written by Brian
Turner, available online. This solver can check around 50,000 16-clue
puzzles per second for a unique completion.
One “little” issue: is this a proof ?
It’s not human-checkable: the computation is too big.
As long as our understanding of physics is sufficiently
accurate to completely predict the behavior of a processor
under the given instruction set, the computation is to be
believed…
… unless there is a bug in their code…
… or there is a bug in the kernel of the OS running the code…
… or a cosmic rays streams in from outer space and knocks
an electron out of place at just the right (wrong?) moment…
… or a radioactive atom in the chip’s substrate material
decays, tossing off an alpha particle…
… or random noise is caused by transient EMF fields,
perhaps from inductive or capacitative “crosstalk”…
… or our understanding of physics isn’t quite good enough…
What a headache!
Are these issues really worth
worrying about, or are they so
rare that they are not a
problem?
Tezzaron Semiconductor, 2004 whitepaper “Soft Errors in Electronic Memory”
estimates that modern memory is subject to 1000 to 5000 FIT (bit flip per billion
hours of use) per Mbit of memory.
A yearlong computation probably has lots of these errors, then!
What to do!?
Define a graph Sud on the set of cells with a complete subgraph in each row,
column, and box.
Definition. A graph G is said to be k-colorable if it is possible to assign k colors to the
vertices in such a way that no edge has both its vertices colored the same.
Definition. The chromatic number χ(G) of a graph G is the smallest integer k so
that G is k-colorable.
Definition. For a graph G and a proper vertex coloring c with exactly χ(G) colors,
define a “determining set” to be a set of vertices so that the coloring, restricted to those
vertices, can be completed to a bona fide proper vertex coloring of the graph in exactly
one way.
Definition. For a graph G and a proper vertex coloring c with exactly χ(G) colors,
define a “critical set” to be a determining set so that removing any vertex makes the
set non-determining.
Definition. For a graph G and a proper vertex coloring c with exactly χ(G) colors,
define scs(G;c) to be the size of the smallest critical set for G and c, and lcs(G;c)
to be the size of the largest critical set.
Definition. For a graph G, define
scsG   min scs(G;c)
c
lcs G   min lcs( G;c)
c
scsG   max scs(G;c)
c
lcs G   max lcs( G;c)
c
Theorem (McGuire et al ‘12).
scsSud   17.
Perhaps by studying these parameters, we can eventually construct a
(human-readable) mathematical proof of this result.
For example…
Theorem (C., Kirkpatrick ’12+). For n even,
scsCn   scsCn   lcs Cn   lcs Cn   1.
Theorem (C., Kirkpatrick ’12+). For n odd,
n 1
scsCn  
2
lcs Cn   n  1
scsCn   n  2
 n 1
 2 if n  1(mod 4)
lcs Cn   
n3

if n  3 (mod 4)
 2
These parameters (by other names) have been studied before in other
contexts, particularly for Latin squares.
Definition. A Latin square of order n is an n X n matrix whose cells are filled with
the numbers 1, …, n, so that each column and row contains exactly one of each
symbol.
Definition. The Latin square graph of order n is the Cartesian product
Kn □ Kn
of two complete graphs on n vertices, i.e., (a,
b) ∈[n] X [n] is adjacent to
(c, d) ∈[n] X [n] iff a = b or c = d.
Theorem (Cavenagh ‘07).
scs(Kn □ Kn) ≥ cn (log n)1/3.
NB. This is the first superlinear lower bound! The proof uses very special properties
of Latin squares. More generalizable proof?
Theorem (Cooper, Donovan, Seberry ‘91).
scs(Kn □ Kn) ≤ ⌊ n2/4⌋ .
Theorem (Cavenagh, Donovan, Abdollah ‘05).
n
scs(Kn □ Kn) ≥ ⌊ n2/4⌋
is odd.
Theorem (Gower, ‘00).
lcs(Kn □ Kn) ≥ n2(1-o(1)).
Theorem (Dejter, Horak, ‘07).
lcs(Kn □ Kn) ≤ n2 – 7n / 2.
Theorem (Ghandehari, Hatami, Mahmoodian, ‘05).
n  (e  o(1)) n
2
5/3
 scs ( L)  n 
2

2
n3 / 2
when
Thanks!
P.S. There are as many open problems about this as there
are graphs. If you are interested in doing some research,
contact me at [email protected].