Sudoku with CNNs

Overview

The purpose of this assignment is to learn about how CNNs work, in particular, to learn about the structure and hyperparameters of CNNs. You will learn the use of common machine learning libraries! In both research and industry settings frameworks like Tensorflow, PyTorch, and Keras have been used to build neural network architectures. This assignment will introduce you to this activity by setting up a neural network to solve sudoku with a convolutional neural network CNN with Keras.

Setup

Create a place to develop/run Python. I was able to get everything set up in VSCode, but if you prefer something else, that is fine.

Install keras: https://keras.io/getting_started/

(Optional) Set up a virtual environment to isolate python versions and various libraries needed to run the project. Because I have several projects in python on my computer, I have some issues with too many dependencies You can learn more about this here: https://www.geeksforgeeks.org/set-up-virtual-environment-for-python-using-anaconda/

Part 1 - Building and running a Sudoku Solver model

To verify that everything is set up correctly, we will test run a sudoku solver that already exists! Credit for this part goes to this developer: https://www.kaggle.com/code/mustisid/solving-sudoku-using-cnn/notebook

Download this zip file
There should be a csv with all the test/training data. This dataset has 1 million 9x9 sudoku games with blank spaces represented as 0’s. You can find some documentation for how the dataset was produced: https://www.kaggle.com/datasets/bryanpark/sudoku
Open and examine build_cnn_model.py
What is particularly interesting is the get_model() method and how layers of the model are defined. You can read docs about how Keras models are instantiated and their methods here: https://www.tensorflow.org/api_docs/python/tf/keras/Model#predict
You will want to pay attention to this process especially for Part 2
Run build_cnn_model.py
This will verify that you’ve installed the relevant libraries/dependencies for running this code. The model should take ~20-40 minutes to train. When the code is finished running you should have a file named ‘solverModel.keras’ saved to your working directory.
Open and examine test_cnn_model.py
Some interesting things in this file are model.predict() and how this project solves the sudoku game 1 by 1. You should look into keras documentation to understand the gist of how this code works.
Run test_cnn_model.py on a game, and see if it’s right!
If you scroll to the bottom, you can modify a game to test the model on. Run the code and see if the output ‘solved’ game is correct! You can find some authentic games here: http://1sudoku.com/
Lab manual: Assess the accuracy of the trained model.
1. [20 pts] Making the model work.
2. [10 pts] What is the percentage of the training as well testing sets does the model solve correctly? Please provide a screen shot of your data.
3. [5 pts] Which of the problems from the Sudoku assignment does the trained model solve correctly?

Part 2: Modifying the Model to Replicate Research

We will modify our current Sudoku solver to try and improve the results so far. This is an exercise both in engaging with some machine learning research and getting familiar with the documentation for making changes to neural network architectures.

Have a look at the Stanford paper:
https://cs230.stanford.edu/files_winter_2018/projects/6939771.pdf There are some notes about the hyperparameters used along with the structure of the architecture. Attempt to replicate some of those architectures. However, also consider the use of pooling layers and other network changes that grab your attention.
Save the model as LASTNAME_model.keras
Lab Manual: Assessment of the new model
1. [25 pts] Experiment with three different network architectures to see the effect on accuracy. Provide the network architecture and hyper-parameters as well as screen shots of the accuracy data.

Resources

Another repository engaging with using a CNN for solving sudoku. There are some attempts to match the results of another paper: https://github.com/charlesakin/sudoku and https://github.com/Kyubyong/sudoku

Extra credit 1

For extra credit, train the CNN to play 16x16 Sudokus. A key challeneg of this assignment is to generate the test data. You may have to write code to produce valid training data, unless you find a database with a lot of 16x16 sudokus.

Points are negotiable

Small groups are acceptable. Ask me first though.

Extra credit 2

For extra credit, reproduce LeNet-5, the CNN developed by Yann LeCun to do MNIST character recognition.

Here are some resources:

Wikipedia: Architecture of LeNet-5
LeCun's article on using CNNs for MNIST recognition. It contains more information on LeNet-5.
A video about training CNNs.

Submission

Please submit a zipped copy of the following items to the appropriate drop-box on Moodle.

The lab manual.
The code for trained and improved models.