Sudoku with CNNs
Overview
The purpose of this assignment is to learn about how CNNs work, in
particular, to learn about the structure and hyperparameters of CNNs.
You will learn the use of common machine learning libraries! In both
research and industry settings frameworks like Tensorflow, PyTorch,
and Keras have been used to build neural network architectures. This
assignment will introduce you to this activity by setting up a neural
network to solve sudoku with a convolutional neural network CNN with
Keras.
Setup
Create a place to develop/run Python. I was able to get everything set up in
VSCode, but if you prefer something else, that is fine.
Install keras: https://keras.io/getting_started/
(Optional) Set up a virtual environment to isolate python versions and
various libraries needed to run the project. Because I have several
projects in python on my computer, I have some issues with too many
dependencies You can learn more about this here:
https://www.geeksforgeeks.org/set-up-virtual-environment-for-python-using-anaconda/
Part 1 - Building and running a Sudoku Solver model
To verify that everything is set up correctly, we will test run a
sudoku solver that already exists! Credit for this part goes to this
developer:
https://www.kaggle.com/code/mustisid/solving-sudoku-using-cnn/notebook
- Download this zip file
There should be a csv with all the test/training data. This dataset
has 1 million 9x9 sudoku games with blank spaces represented as
0’s. You can find some documentation for how the dataset was produced:
https://www.kaggle.com/datasets/bryanpark/sudoku
- Open and examine build_cnn_model.py
What is particularly interesting is the get_model() method and how
layers of the model are defined. You can read docs about how Keras
models are instantiated and their methods here:
https://www.tensorflow.org/api_docs/python/tf/keras/Model#predict
You will want to pay attention to this process especially for Part 2
- Run build_cnn_model.py
This will verify that you’ve installed the relevant
libraries/dependencies for running this code. The model should take
~20-40 minutes to train. When the code is finished running you should
have a file named ‘solverModel.keras’ saved to your working
directory.
- Open and examine test_cnn_model.py
Some interesting things in this file are model.predict() and how this
project solves the sudoku game 1 by 1. You should look into keras
documentation to understand the gist of how this code works.
- Run test_cnn_model.py on a game, and see if it’s right!
If you scroll to the bottom, you can modify a game to test the model
on. Run the code and see if the output ‘solved’ game is correct! You
can find some authentic games here: http://1sudoku.com/
- Lab manual: Assess the accuracy of the trained model.
- [20 pts] Making the model work.
- [10 pts] What is the percentage of the training as well testing
sets does the model solve correctly? Please provide a screen shot of
your data.
- [5 pts] Which of the problems from the Sudoku assignment does the
trained model solve correctly?
Part 2: Modifying the Model to Replicate Research
We will modify our current Sudoku solver to try and improve the
results so far. This is an exercise
both in engaging with some machine learning research and getting
familiar with the documentation for making changes to neural network
architectures.
- Have a look at the Stanford paper:
https://cs230.stanford.edu/files_winter_2018/projects/6939771.pdf
There are some notes about the hyperparameters used along with the
structure of the architecture. Attempt to replicate some of those
architectures. However, also consider the use of pooling layers and
other network changes that grab your attention.
Save the model as LASTNAME_model.keras
- Lab Manual: Assessment of the new model
- [25 pts] Experiment with three different network architectures to
see the effect on accuracy. Provide the network architecture and
hyper-parameters as well as screen shots of the accuracy data.
Resources
Another repository engaging with using a CNN for solving sudoku. There
are some attempts to match the results of another paper:
https://github.com/charlesakin/sudoku
and https://github.com/Kyubyong/sudoku
Extra credit 1
For extra credit, train the CNN to play 16x16 Sudokus. A key challeneg
of this assignment is to generate the test data. You may have to write
code to produce valid training data, unless you find a database with a
lot of 16x16 sudokus.
Points are negotiable
Small groups are acceptable. Ask me first though.
Extra credit 2
For extra credit, reproduce LeNet-5, the CNN developed by Yann LeCun
to do MNIST character recognition.
Here are some resources:
Submission
Please submit a zipped copy of the following items to the appropriate
drop-box on Moodle.
- The lab manual.
- The code for trained and improved models.