An interesting and so far seemingly unanswered question is the number of clearly solvable Sudoku puzzles with 17 to 80 clues. While this number cannot be computed directly (without solving all Sudokus - about 1.3 × 10^34), it is fairly easy to approximate at 5.3 × 10^33, if you count only the essentially different Sudoku puzzles (otherwise about 6.4 × 10^45).
German Wikipedia says on their discussion (talk) page for Sudoku: "The number of possible Sudoku puzzles lies somewhere between 10^13 and 10^50". This we can determine more exactly now.
Ed Russell and Frazer Jarvis [Russell, Jarvis 2006] showed that there are 5,472,730,538 (≈ 5.5 × 10^9) essentially different Sudoku solutions, i.e. without exchanges of digits, rotations, or permutations of rows or columns, and permutations of blocks of rows or columns.
Essentially all information on the web about the number of Sudoku puzzles are answered incorrectly with the ("readily" computable) number of completed Sudokus (grids): Either [Felgenhauer, Jarvis 2005]’s count of 6,670,903,752,021,072,936,960 (≈ 6.7 × 10^21) of indistinct solutions, or the previously quoted count of distinct solutions of 5,472,730,538 (Russell and Jarvis), the value without exchanges of digits, rotations and permutations. The really interesting question, however, is the number of Sudoku puzzles, that is, which have not yet been completed and are also clearly solvable. After all, Brian Hayes recognized the problem back in 2008: "It needs to be emphasized that both of these calculations (by Felgenhauer and Jarvis and by Russell and Jarvis) count the number of solutions, not the number of puzzles" - but he could not do the calculations.
Here we examine the total sum of all Sudoku puzzles with 17 to 80 clues with the same requirements as with [Russell, Jarvis 2006]. The following overview table shows our latest results for interesting ranges of clue counts with the respective cumulative number of puzzles:
|17-32:||≈ 1.3 × 10^31|
|17-36:||≈ 2.7 × 10^32|
|17-40:||≈ 1.5 × 10^33|
|17-48:||≈ 4.9 × 10^33|
|17-64:||≈ 5.3 × 10^33|
|17-80:||≈ 5.3 × 10^33|
The approach to estimating the cumulative counts is as follows:
For each of the roughly 5.5 × 10^9 essentially different Sudokus there are for N clues , short "C(81, N)" or "81 choose N", different Sudokus. The product of both numbers gives us the number of all possible Sudokus at N clues, with the sum of about 1.3 × 10^34 for all clues from 17 to 80; but not all of these are clearly solvable. Therefore we multiply each of these products with a percentage for N clues, calculated by numerous stochastic computations. We can then accumulate these across ranges of N as shown in the given table.
For the calculation of the percentages for each N we start with a randomly selected set of fifty or more Sudoku solutions, and generate at least 100,000 Sudoku puzzles by randomly selecting 81-N digits to remove, for values of N from 17 through 80. We then use a trial-and-error-based solver to determine how many of the generated puzzles have distinct solutions. This gives us an approximation for the percentage of solvable Sudoku puzzles with N clues (with some amount of error introduced by the random numbers involved in the generation step). For the actual distribution see the picture of percentages, calculated from about 350 millions of randomly generated Sudokus.
The percentages for the clues 78 to 80 are obviously 100 %, for N = 77 they are 99.999391... %, because 6/80*(2/79*1/8)*1/78 + 2/80*(6/79*1/8)*1/78 Sudokus always have two solutions because of an unique rectangle with the missing numbers in horizontal and vertical boxes. In addition for N = 76, we get exactly 5 times this value, since there are 5 options for forming an unique rectangle - and this results in 99.996957 ... % for the percentage of the clearly solvable sudokus. Similarly, for N = 75, 74, 73, ... the factor "6 choose 4" = 15, "7 choose 4" = 35, "8 choose 4" = 70, etc. applies until other cases/constellations occur that a Sudoku cannot be clearly solved - in each of the examples found, these were generally several 6-unique loops or triple parallel pairs (at N = 74 in 0.12 % of the cases, at N = 73 in 0.30 % of the cases).
We may then smooth these percentages (or their log values, considering their reduced range) using least squares methods or similar algorithms, e.g. with a polynomial of degree 5. A smoothing of the values and an extrapolation in the range below 24 only resulted in a deviation in the 3rd position.
, short "C(81, N)" or "81 choose N", here is the binomial coefficient 81! / (N! × (81-N)!) = (81-N+1)×(81-N+2)×...×79×80×81 / (1×2×3×...×(N-1)×N), determining the number of ways to choose a subset of N elements from a fixed set of 81 elements regardless of their order.
The greatest contribution to the sum of all solvable Sudoku puzzles – about 97 % – stems from Sudoku puzzles with between 35 and 51 clues. The percentages for values of N under 24 were extrapolated since they’re quite expensive to determine (they require at least 10^7 to 10^12 puzzles to be evaluated rather than 100,000) and contribute little to the overall count.
Of all Sudokus with 17 to 80 clues (13,232,250,261,443,826,842,603,337,271,215,336 ≈ 1.3 × 10^34), about 40% are solvable, most of them (about 71 %) of course in the upper range of 41 to 80 clues.
There are roughly 4.3 × 10^32 solvable Sudokus with 40 clues, but two of those have previously been shown not to be reducible to fewer clues. For details, see von Mladen Dobrichev. On the other end, Gordon Royle’s collection of about 50,000 Sudokus with 17 clues are much closer to the 10^12 solvable Sudokus in that category.
If we ignore the restrictions of Russell and Jarvis, we get to work with the higher value of Felgenhauer and Jarvis (≈ 6.7 × 10^21), therefore all numbers are higher by a factor of 1.2 × 10^12: i.e. there are 1.6 × 10^46 possible Sudokus, and 6.4 × 10^45 of them are clearly solvable.