Logic puzzle game solver using Python, Prolog, scikit-image, a camera and a 3D printer.
The idea of this project is to take a picture of a game on a phone (like Simon Tatham's Puzzles) and solve it using a programming language (like Prolog). Then use a capacitive pen attached to a 3D printer to solve the game on the phone.
The workflow is as follows:
- Take a picture of the game in it's initial unsolved state
- Process this image to get a numerical model of the game that a program can solve without worrying about computer vision
- Solve the game and get a list of steps to get to the solution
- Apply the steps in real-world on a 3D printer
Obviously this is a very simple way of solving games, and discarding all games that cannot be solved using a single finger. But this restriction is no problem with logic and puzzle games!
You can find a more detailed explanation of each step below. It's well worth the 5 minute read.
To install, just
$ git clone https://github.com/twinone/logsolve
$ pip3 install -r requirements.txt
To use, assuming you want to solve game GAME:
$ export PYTHONPATH=/path/to/logsolve:$PYTHONPATH
$ python3 games/GAME/GAME.py
If you're using logsolve often or developing it, it's recommended to add the pythonpath to your bashrc
echo 'export PYTHONPATH=/path/to/logsolve:$PYTHONPATH' >> ~/.bashrc
Here are the steps that the programs do in order to solve a puzzle
Capturing the image and undistorting works reliably across multiple angles across multiple games. Here is what the process looks like with the Loopy game:
From left to right:
- Original image
- Local-thresholding and Gaussian Blur
- Hough line detection
- Largest connected-component convex hull (& detection of warping points)
- Resulting warped image prepared for solving
Here is an example with other games:
It's a pretty robust approach, since it has detected 9 different games without even touching the parameters.
We can accurately detect the size of the grid on an undistorted image: From left to right:
- Original undistorted image
- Local-thresholding
- Hough line detection
- Line clustering
The line clustering algorithm first selects horizontal and vertical lines, and groups them by distance. This way we'll have an accurate and redundant map of the grid. If we count the number of groups we get the size of the grid.
We test the undistorted images from the Capture process. Images are manually labeled as nameN-XxY.jpg where X and Y are the width and height in grid cells of the puzzle.
Automated testing of all images in the test folder produces the following result:
[PASS] test/fifteen0-4x4.jpg: 4x4
[PASS] test/galaxies0-7x7.jpg: 7x7
...
[PASS] test/unruly0-10x10.jpg: 10x10
[PASS] test/range0-6x9.jpg: 6x9
It's a very flexible approach, as there can be both NxM grids and the vertical size of a cell does not need to match the horizontal size.
In most games we want to be able to distinguish between dark and light cells. This is an easy problem with thresholding. Thresholding converts an image (grayscale) to a binary image (black and white only) pixel by pixel based on whether the pixel meets certain criteria, like being brighter than the mean, or being darker than the mean of the 20x20 pixels surrounding it. If we use all the pixels it's called global thresholding, and the same criteria is applied to all pixels. This is a fast approach. Using only the neighborhood of a pixel is called local thresholding and is a bit slower but can easily solves shadow problems. A quick comparison shows the problem with global thresholding.
Just being 'brighter' than the mean value is a little error prone, so we use an offset to make sure it's at least brighter by offset. Since the brightness of the pixels are between 0 and 1 in this example, an offset of 0.1 worked surprisingly well. We'll use local thresholding even though it's a bit slower, because with more aggressive shadows global thresholding could be a problem.
Thresholding is nice, but it only classifies pixels in a binary way, a pixel is either in the resulting image or it is not, but in the game Unruly there are white, black and empty squares, and we want to detect all three.
The solution is fairly simple, we do two separate binary thresholdings, one for the white squares and one for the black squares:
The little errors are fine, we will be taking the average color of each cell after thresholding. Just in case it's probably a good idea to only use the center part of a cell for color detection, to avoid errors around the edges:
With a slightly more complicated game, Light Up, we can use the same approach, without changing any code, and get the following result:
This way we only have to check for numbers on the black cells (and we need to know where they are anyway to solve the game).
For the Unruly game, I've written a small Prolog script that calculates the solution given a problem. Given following problem:
10 10
-1 -1 -1 -1 0 -1 -1 -1 0 -1
-1 -1 -1 -1 -1 1 -1 -1 -1 1
-1 0 -1 -1 -1 1 -1 -1 -1 -1
-1 -1 -1 -1 0 -1 -1 -1 1 -1
-1 0 -1 -1 -1 -1 -1 0 -1 1
-1 -1 1 -1 -1 -1 -1 -1 -1 1
-1 -1 -1 0 -1 -1 1 -1 1 -1
-1 -1 0 -1 -1 -1 1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1 1
1 0 -1 -1 -1 -1 -1 0 -1 1
It prints the following solution and steps:
$ ./solve < input.txt
1 1 0 1 0 0 1 1 0 0
0 1 0 0 1 1 0 0 1 1
1 0 1 0 1 1 0 1 0 0
1 1 0 1 0 0 1 0 1 0
0 0 1 0 1 0 1 0 1 1
0 0 1 1 0 1 0 1 0 1
1 1 0 0 1 0 1 0 1 0
0 1 0 1 1 0 1 1 0 0
0 0 1 1 0 1 0 1 0 1
1 0 1 0 0 1 0 0 1 1
dclick 1 1
dclick 1 2
click 1 3
[...]
dclick 10 6
click 10 7
dclick 10 9
Now the solution only has to be sent over the serial bus (or network) to the 3D printer. To do this we use pyserial (we will possibly elaborate on other ways to communicate)
If you take a look at core/comm, there the minimum required commands to get this project to work, and a gcode interpreter. (This will probably be a separate project in the future)
Before actually making clicks we need to know where the phone is on the printer bed. We could use computer vision for this, but it would require a specific angle of the camera, and to keep it simple we are going to calibrate manually.
To calibrate we use $ python3 calibrate.py </dev/printer>
Establishing connection to printer
Home all (G28)
Centering (50mm up for safety)
Align the touch tip to the current edge.
Enter mm to move in axis, 0 to end
Move Z down until it touches the screen
> -30
> -5
> 0
Center vertically on the game grid
> -10
> -30
> -30
> 0
Center on the LEFT edge of the grid
> -40
> 0
Center on the RIGHT edge of the grid
> 0
Calibrated: (70, 70, 40, 15)
Now the printer is calibrated, ((70, 70, 40, 15)
in the example). These values represent the left, right, vertical center and Z axis in mm of the printer's origin.
The 3D printer will execute the orders sent to it. You would expect a capacitive pen attached to the printer to do it job, but it doesn't. The reason is that capacitive pens actually use your body's capacitance, but they don't work if you don't hold them. This results in not working touches most of the time.
To overcome this, we use aluminum foil as capacitive tip instead of a dedicated pen. This allows us to connect the other end to the phone's charger, and that way the tip is grounded with the phone and works reliably. See HARDWARE.md for details.
Here is how the (very ugly but functional) prototype looks:
Currently I'm learning about TensorFlow and using the MNIST dataset seems like the perfect solution to detect numbers. We still need to be able to distinguish between cells with and without numbers, so maybe this part will take some time to be completed. Meanwhile simple games like Unruly can already be solved, so both TensorFlow and the solving will be implemented in parallel.
- Image capturer
- Detect grid size
- Cell classification and color detection
- Implement number detection (In progress)
- Design game representation
- Solver
- Solution - 3D printer representation
- Computer - 3D printer interface
- Android app
- Web server
- Glue everything together and make a nice API (WIP)