In this assignment, you’ll implement a (slightly simplified) version of the technique described in the influential 2020 paper NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis by Mildenhall et al. Given a collection of images taken from multiple perspectives, along with pose information for those cameras, NeRF is an elegant technique that encodes a 3D volumetric representation of the scene in a neural network.
Accept the Project 4 assignment on Github Classroom and clone your repository. This repository contains the assignment’s single Jupyter notebook, which contains detailed instructions as well as skeleton code and TODOs for you to complete.
This project requires the use of a GPU to train NeRF models using the Pytorch framework. The recommended approach is to run on one of the departmental GPU servers, as you did on project 3.
Alternative approaches that should work but I cannot guarantee support for include:
This project has similar software requirements to the prior projects, but also requires a few additional packages. The necessary Python requirements are given in requirements.txt
, included in your repo. To set up an environment specifically for p4, go to your repo directory and run:
python3 -m venv p4env
source p4env/bin/activate
pip install -r requirements.txt
It may take a few minutes to install everything. If you want to augment your existing environment, just activate that environment and pip install -r requirements.txt
there.
The input and test data for this project will be downloaded (and cached) by the notebook as needed. I recommend running the notebook from the same place when possible (or moving all the downloads along with it) to avoid having to re-download files. Do not commit the data files to your Github repository.
There are several pieces we ask you to implement in this assignment:
The instructions about individual TODOs are present in detail in the notebook.
To help you verify the correctness of your solutions, we provide tests at the end of each TODO block, as well as qualitative and quantitative evaluation at the end of the notebook.
With no positional encoding:
With 3 frequencies:
With 6 frequencies:
The optimization for this scene takes around 1000 to 3000 iterations to converge on a single GPU (in 10 to 30 minutes). We use peak signal-to-noise ratio (PSNR) to measure structual similarity between the prediction and target image, and we expect a converged model to reach a PSNR score higher than 20, and produce a reasonable depth map. Here’s an output after 1000 iterations of training with default settings:
Here’s the resulting 360 video:
Execute your completed notebook in Jupyter and commit the resulting version of your .ipynb file to your github repository; make sure the cell outputs are contained in the notebook. Also commit the 360 video output (lego_spiral_001000_rgb.mp4
) to your repository, and push to Github by the deadline.
Points are awarded for correctness and efficiency, and deducted for issues with clarity or submission mechanics:
Clarity Deductions for poor coding style may be made. Please see the syllabus for general coding guidelines. Points may be deducted for any of the following:
This assignment is based on an assignment from Noah Snavely; thanks to Noah and numerous underappreciated TAs.