nj3h wrote:
One of the things that I like about SudoCue and Sadman's Sudoku is the ability to select techniques to be considered for a puzzle. Just because you select X-Wing or Empty Rectangle doesn't mean a particular puzzle will have that technique, but it could. It sure as heck won't have skyscrapers, kites, stuvwxyz-wings or whatever.
I guess the generation of Samurai puzzles within a set of possible techniques must be more difficult than I am capable of understanding. However, I hope that this is ultimate goal of any generator program.
I started to embed logical techniques into the generator in JSudoku.
I started with grids with givens, (vanilla and gattai). I've nothing yet for killer.
See the Samurai I posted hereIt wasn't that difficult to implement after all. But I still have quite some work to make it customizable...
Here are some of my notes/ideas related to this topic:
It's rather easy to control the maximum number of times a single technique is used.
But, we should also be able to control the maximum number of times a set of techniques are used. For example: intersections + naked pairs + hidden pairs all together may be used at most N times.
Generating easy to medium puzzle is rather easy, since we need to check only for a limited set of basic techniques, which are fast for a program to check.
As expected, generating hard puzzles is a hard job, because we're looking for puzzles requiring harder techniques, which is more work for a program.
It's even harder if one is looking for a puzzle which requires some specific technique, while excluding others. For example, I tried to generate a Clueless which requires an X-Wing but without turbot, XY-Wings... After an hour or so, it didn't found any puzzle, so I cancelled it.
Here I count the minimum number of times a technique is used. But we should also be able to control the minimum number of times a set of techniques are used. eg there must be at least one of X-Wing or XY-Wing or turbot.
I opted for config files in xml, with some defaults build-in the program. But I still have to desing some UI to let you edit the details, list of techniques with minimum and maximum count, sub-set(s) of techniques with minimum and maximum count...
Not sure this would be sufficient to control the difficulty. I didn't make enough tests yet, but there's probably a good reason why most programs goes through some global rating instead of filtering for particular techniques...
I also found some intersesting ideas in
this paper (PDF). In particular the bottleneck idea for the easier puzzles like those published in newspapers.
I played with bottleneck for the very easy puzzles: at each step of the solve path there must be at least N different hidden singles available. Naked singles are not used here, since the idea it to generate a grid solvable with cross hatching only, but no pencil marks.
I currently test only for one solve path. But to be more accurate, one should find the critical solve path, since someone could follow another path which leads to a slimer bottleneck.
We could also add some control over statistical tests for the distibution of techniques. Like median, mean, standard deviation (or variance)...
See also
the .scl format used by SudoCue.
BTW can someone explain the difference between "Full House" and "Last Digit"? In
the SudoCue solving guide they both refer to the same thing!
I already explained how the generator works in JSudoku. I've "just" added some additional checks using logical techniques while choosing givens.
Since my initial description was unclear, here is an (hopefully) more understandable one:
Every cell is in one of these 3 states:
unvisited/untested given,
required given or
empty/ungiven.
When solving, unvisited or required givens makes no difference, they're just givens/clues.
1. Fill in all cells with a valid solution.
1a. All cells are unvisited givens at first. For overlapping variants, overlapping cells are empty/ungiven if so required.
2. Begin loop to visit all cells at random. For symmetrical puzzles, visiting/choosing a cell also chooses all its symmetrical cell(s). After the visit, the (symmetrical) cell(s) becomes either required given(s) or empty/ungiven cells.
2a. Tag the (symmetrical) cell(s) chosen at 2 as empty/ungiven.
2b. If this leaves too few givens, tag the (symmetrical) cell(s) as required given(s), stop (or continue) the visit loop.
3. Recursively solve the puzzle with all unvisited and required givens as clues. Search for 2 solutions.
3a. My implemetation uses Dancing links (DLX), which makes it easy to count the number of guesses (number of DLX rows - 1) and number of singles (number of DLX columns with 1 DLX row) for each step.
4. If this yields multiple solutions, tag the (symmetrical) cell(s) as required given(s), continue the visit loop.
4a. Dito if this yields too many guesses.
4b. If no guess made at 3a, the grid can be solved with singles only. IWO for each step, DLX found at least one column with a single row.
4c. For the "very easy" difficulty, findout the bottleneck which is the minimum number of singles in the solve path (based on 3a). If too slim/narrow a bottleneck (or used some guess) -> required, next visit.
5. If some guess made at 3a, the grid requires some other technique(s) to solve. But DLX can't tell which particular technique(s) are required. So we need to solve using logical techniques to findout and count the required techniques.
5a. Start to solve using logical techniques among the set of allowed techniques.
5b. If some technique is used more than allowed -> tag as required, next visit (this could be checked within the logical solve loop).
5c. If the grid does not solve using the set of allowed techniques -> tag as required, next visit.
6. Next visit (goto 2.)
7. When all cells have been visited. Final check: discard the grid and re-generate if:
7a. Too many givens
7b. Or too few guesses
7c. Or any required technique was used too few times
Notes: for step 1 I also use DLX, stoping at first solution found. Before DLX starts "solving", all DLX candidate rows are enabled and I shuffle the mandatory DLX columns and the DLX rows within each column. This yields a random valid grid.
For overlapping variants, I now use Børge approach which prove a real time saver, in particular for clueless. It first fills in only the sub-grid with most overlapping cells. This is done by turning the other DLX columns as optional/secondary. They're turned mandatory/primary as soon as the sub-grid gets filled.
For step 2, to avoid clusters of givens, I now use a mix of Børge's and my approaches. It will choose some unvisited cell among those with most number of givens buddies (Børge) + number of adjacent givens (my original approach). There is still a slight tendancy to produce checker patterns, but not as often as before. For diagonal variants, it tends to place very few givens on the diagonals (more buddies), which yields more "interesting" grids.
Comments and ideas welcome.