Relu FFnet

Overview

The purpose of this assignment is to learn how different activations functions influence the performance of a network.

Assignment

FFnetRelu [35 pts] Create an FFnetrelu class, feel free to copy and paste. This class should be such that one could create an arbitrary number of hidden layers. It should also have an arbitrary number of output nodes.
Please use Relu activation functions for all hidden layers and softmax for the output layer.
Please test your network by implementing the XOR network from the following worksheet solution file and comparing the data from your network to the solutions: Solution to Relu FFnet class worksheet Please notice that we use a sigmoid activation function for testing. Be sure you fix the weights as indicated and use the 0.7 learning rate as indicted. The solution sheet also contains the algorithm you want to implement, although it is very close to what you should already have.

Once your revised network works, please ensure that the output layer can have more than output and please use softmax activation. Below is sample code you may use.

private double[] softmax(double[] inputs) {
  double[] output = new double[inputs.length];
  double max = inputs[0];
  // Find max for numerical stability
  for (int i = 1; i < inputs.length; i++) {
    if (inputs[i] > max) max = inputs[i];
  }
  // Compute exp and sum
  double sum = 0;
  for (int i = 0; i < inputs.length; i++) {
    output[i] = Math.exp(inputs[i] - max);
    sum += output[i];
  }
  // Normalize
  for (int i = 0; i < inputs.length; i++) {
    output[i] /= sum;
  }
  return output;
}

Please use a very small learning rate, around 0.01
Please use very small initial weights, in the range of [-0.05..0.05[

Please use the following code to determine the number of errors.

  int predictedClass = 0;
  for (int j = 0; j < outputLayerSize; j++){
    if (activationOutputLayer[j] > activationOutputLayer[predictedClass]) {
      predictedClass = j;
    }
  }
  if (this.desiredOutput[example][predictedClass] != 1.0) {
    numEssentialErrors++;
  }

Please experiment with MNIST. Use the above hyperparameters as well as one hidden layer. Experiment with the size of the hidden layer and the number of epochs. Let's aim for 97% correct predictions on the test set. What is the least number of epochs necessary to achieve this precision. For your final answer, ensure that the network can achieve 97% precision on five independent runs. Please create a document and list the values of the hyper parameters as well as any observations about the relationship between the precision on the training set and the test set.

Submission

Please submit a zipped copy of the testing and the Relu FFnet java files as well as the document with your responses.