The starting code for this section looks like this:
class SimpleCNNDepth(nn.Module):
def __init__(self):
super(SimpleCNNDepth, self).__init__()
# TODO: your code here
def forward(self, x):
# TODO: your code here
return xPyTorch uses nn.Module to represent one or more layers
of a neural network. To define your own PyTorch model, you have to write
code that says how an input tensor gets processed and becomes
output.
Here’s a short example of how you could make a tiny neural network:
class SimpleCNN(nn.Module):
def __init__(self):
super(SimpleCNN, self).__init__()
# Encoder (Feature Extraction)
self.layers = nn.Sequential(
nn.Conv2d(3, 32, kernel_size=3, padding=1),
nn.ReLU(),
nn.Conv2d(32, 64, kernel_size=3, padding=1)
)
def forward(self, x):
x = self.layers(x)
return xThis defines a network that performs the following sequence of operations:
You’ll also notice parameters that set the number of channels in each
layer. The input images are three-channel (RGB). After the first
convolution the tensor is 32-channel. And after the final convolution
the tensor has 64 channels. So this example model takes an input tensor
of shape (N, 3, H, W) and outputs a tensor of shape
(N, 64, H, W).
Many modern models follow an encoder-decoder architecture. Roughly, this means that the network has a front half, called the encoder, that shrinks images spatially but deepens in the number of channels. Then the back half (decoder) does the opposite.
The code could be organized like this:
class SimpleCNNDepth(nn.Module):
def __init__(self):
super(SimpleCNNDepth, self).__init__()
self.encoder = nn.Sequential(
# TODO
)
self.decoder = nn.Sequential(
# TODO
)
def forward(self, x):
x = self.encoder(x)
x = self.decoder(x)
return xSo, what goes in the encoder and decoder?
For a good starting point on this project, try this architecture:
3 (input), 36, 64, 128128, 64, 32, 1 (output)
If you’d like more coding help, read on.
Here’s what it could look like if the encoder only had one block:
self.encoder = nn.Sequential(
nn.Conv2d(3, 32, kernel_size=3, padding=1),
nn.ReLU(),
nn.MaxPool2d(2) # Downsample by 2x
)and here would be the corresponding one-block decoder:
# Decoder (Upsampling)
self.decoder = nn.Sequential(
nn.Conv2d(32, 1, kernel_size=3, padding=1), # Output 1-channel depth map
nn.ReLU(), # but skip this in the final block
nn.Upsample(scale_factor=2, mode='bilinear', align_corners=True),
)