Home Syllabus Schedule AI Prompt Resources

Structure from Motion Lab

CSSE 461 - Computer Vision

Overview

“Structure from Motion” is the problem of reconstructing the geometry of a 3D scene given many 2D images. As we’ll cover in class, this is a hard math problem, and solving it is an area of active research.

Most SfM solvers are written for researchers, by researchers. The closest thing we have to an end-user codebase is COLMAP. It clocks in at roughly 100,000 lines of code! So we won’t be writing our own SfM solvers in class.

You’ll be creating a report as you go. At the end of the lab, upload your report to Gradescope.

You may work solo or in pairs, as you choose.

Objectives

Part 1: Get COLMAP

  1. The COLMAP project page is here. Look for the link to “Download pre-release binaries”. Scroll to the bottom of the page and get the download. For simplicity I chose “colmap-x64-windows-nocuda.zip”. Download the zip file.
  2. Unzip the file you just downloaded. Extract the zip folder. This may take a few minutes.
  3. COLMAP doesn’t “install” itself in the usual way. It ships as executable binaries. Simply go to the folder where you extracted the files and double-click COLMAP.bat.

If everything worked, you should see something like this:

Instructions for other operating systems

If you aren’t completing this lab on a Windows computer, read COLMAP’s install docs. Look first for a way to install a pre-built binary. (For example, Mac OS users can do brew install colmap, and some Linux users will have packages available.) If none of those apply, read further down the page for instructions on building the software from source. Please reach out to me if you hit snags.

Part 2: Run COLMAP on a standard dataset

This GitHub page has a nice list of standard datasets used in computer vision. The first category is “SfM and MVS”. These datasets are suitable for us (although some will be too large to solve in a reasonable amount of time).

Grab the “Small object dataset” called Bunny, hosted by the vision lab at TUM. Download the .tar.gz file. In Windows, right-click to “Extract all”.

Go back to COLMAP. We’ll run SfM in a series of steps:

Image selection

  1. File -> New Project.
  2. COLMAP stores a series of intermediate calculations in an SqLite database. Click “New” next to “Database”. Choose a file location you’ll remember. (I did Downloads/bunny_sfm.db.) Click “Save”.
  3. Next to “Images” click “Select”. Navigate to the folder containing the dataset images and “Select Folder”. For me, this was Downloads/bunny_data/bunny_data/images.
  4. Click “Save”.

Detecting Keypoint Features

Next we’ll run algorithms that identify keypoints in the images. Click “Processing -> Feature Extraction”. Most of these options are fine at their default values. Note a few that are relevant:

Let’s choose SIMPLE_RADIAL and Shared for all images.

Click “Extract” and wait a moment. When it finishes computing you can close the Feature Extraction window. COLMAP has finished running SIFT, a top-notch feature extractor. The extracted locations were written to the db we created. You can see these detected keypoints by clicking “Processing -> Database Management”, selecting an image, and click the “Show Image” button.

Feature Matching

The next step of processing is to match keypoints across different views. We call a set of keypoints that all refer to the same 3D point a “track”. Here’s a conceptual view of what we’re trying to compute next: we’ve matched the same 3D point across multiple views. We know the pixel locations of this one point as it appears in each image.

Click “Processing -> Feature matching”.

The tabs across the top of this window are different algorithms for choosing which images to try to match. Out dataset is small, so stick with the first tab (“Exhaustive”) and click “Run”.

You can see what happened by clicking “Processing -> Database Management”. Select an image, and click “Overlapping images”. Now select one overlapping image from the list and click “Show Matches”.

Reconstruction

Click “Reconstruction -> Start Reconstruction”. Let it run. Look at the stats in the bottom bar of the COLMAP window, the log messages to the side (and also in the terminal that opened when you launched COLMAP), and the 3D visualization.

Look under “Render -> Render options” for ways to tweak the 3D view. I found “Image connections” interesting: it shows which cameras were able to match some scene content with each other.

Writeup questions

Add the following to your report:

Part 3: Analyze COLMAP output

Click on “File -> Export model as text”. This will write a bunch of .txt files, so consider making a folder for them. Open frames.txt in a text editor. This file contains the computed extrinsics for each image in the reconstruction.

Find the first line of data. I want you to compute the 3D coordinates of the center of projection of this camera. Add this computation to your report. Show your work. You’ll probably write a few lines of code. Show those in the report too.

Note: Quarternion rotations

COLMAP stores 3D rotations as quarternions. Fortunately, we can convert these to rotation matrices easily:

from scipy.spatial.transform import Rotation as R
quat = [w, x, y, z]   # your numbers here
rotmat = R.from_quat(quat, scalar_first=True).as_matrix()
print(rotmat)  # 3x3 rotation matrix

Note: camera centers

Recall the equation that defined \(\mathbf{R}\) and \(\mathbf{t}\):

\[ X_c = \mathbf{R} X_w + t \]

The camera’s center of projection is the point where \(X_c = \mathbf{0}\). So to find the camera’s position in world coordinates, set \(X_c\) to zero and solve for \(X_w\). (Hint: rotation matrices are special: \(\mathbf{R}^{-1} = \mathbf{R}^\top\).)

Part 4: Reconstruct a new scene

For part four, create your own dataset, run SfM, and inspect the reconstruction. Include these in your report:

Dataset collection tips

Extra credit

I’ll offer a small amount of extra credit if you can collect a dataset where the reconstruction fails in an interesting way. I’ll be stingy on this–it needs to really be interesting! Of course you’ll fail to reconstruct a scene where cameras can’t see any context in common, or where everything looks like a plain white wall.

Submission

Upload your report (with answers for Parts 2, 3, and 4 of this lab) as a pdf to Gradescope. If you worked in a pair, use the Team feature on the Gradescope assignment to upload a single report for your team.