The starting code for this section looks like this:
class SimpleCNNDepth(nn.Module):
def __init__(self):
super(SimpleCNNDepth, self).__init__()
# TODO: your code here
def forward(self, x):
# TODO: your code here
return x
PyTorch uses nn.Module
to represent one or more layers of a neural network. To define your own PyTorch model, you have to write code that says how an input tensor gets processed and becomes output.
Here’s a short example of how you could make a tiny neural network:
class SimpleCNN(nn.Module):
def __init__(self):
super(SimpleCNN, self).__init__()
# Encoder (Feature Extraction)
self.layers = nn.Sequential(
nn.Conv2d(3, 32, kernel_size=3, padding=1),
nn.ReLU(),
nn.Conv2d(32, 64, kernel_size=3, padding=1)
)
def forward(self, x):
x = self.layers(x)
return x
This defines a network that performs the following sequence of operations:
You’ll also notice parameters that set the number of channels in each layer. The input images are three-channel. After the first convolution the tensor is 32-channel. And after the final convolution the tensor has 64 channels. So this example model takes an input tensor of shape (N, 1, H, W)
and outputs a tensor of shape (N, 64, H, W)
.
Many modern models follow an encoder-decoder architecture. Roughly, this means that the network has a front half, called the encoder, that shrinks images spatially but deepens in the number of channels. Then the back half (decoder) does the opposite.
The code could be organized like this:
class SimpleCNNDepth(nn.Module):
def __init__(self):
super(SimpleCNNDepth, self).__init__()
self.encoder = nn.Sequential(
# TODO
)
self.decoder = nn.Sequential(
# TODO
)
def forward(self, x):
x = self.encoder(x)
x = self.decoder(x)
return x
So, what goes in the encoder and decoder?
For a good starting point on this project, try this architecture:
3 (input), 36, 64, 128
128, 64, 32, 1 (output)
If you’d like more coding help, read on.
Here’s what it could look like if the encoder only had one block:
self.encoder = nn.Sequential(
nn.Conv2d(3, 32, kernel_size=3, padding=1),
nn.ReLU(),
nn.MaxPool2d(2) # Downsample by 2x
)
and here would be the corresponding one-block decoder: