Home Syllabus Schedule AI Prompt Resources

Project 2: Autostitch

Overview and Logistics

Teamwork

You may work on this assignment solo or in groups of two. If you would like to work in a pair, you need to complete the following steps:

  1. Find your partner.
  2. The first member of the pair to accept the Github Classroom invite should create a new team (it doesn’t matter which group member does this).
  3. The second member of the pair to accept the Github Classroom invite should find the team created by the first member and join it. To use late days, both members must spend a late day.

Synopsis

In this project, you will implement a system to combine a series of horizontally overlapping photographs into a single panoramic image. We’ll use the built-in ORB feature detector and descriptor from the opencv library. Given the feature correspondences, you will automatically align the photographs (determine their overlap and relative positions) using RANSAC to find an outlier-robust motion model and then blend the resulting images into a single seamless panorama.

You are provided with a GUI that lets you test and visualize the functionality and intermediate results of the various stages of the pipeline that ultimately produces the final panorama output. We have also provided you with some test images and unit tests to help you debug.

The high-level steps required to create a panorama are listed below. You will implement panorama stitching using translation and homography motion models.

  1. Take a sequence of photos with horizontal overlap

  2. Extract features from each image

  3. Match features among neighboring pairs of images

  4. Align neighboring pairs using RANSAC

  5. Write out list of transformations that relate each image to a single coordinate system

  6. Warp the images into the output panorama and blend them together

  7. Crop the panorama and admire the beautiful result

Getting Started

Skeleton code is provided in the repository created by Github Classroom. The invitation link is on Moodle.

Test sets: See the resources subdirectory in your repo. You will find three datasets: yosemite, melbourne, and melbourne_small.

Software environment: See the Environment Setup page for detailed installation instructions for Linux, macOS, and Windows. You’ll need Python 3.10+ with numpy, opencv-python, and pillow. The GUI uses Tkinter, which comes with most Python installations.

Your Tasks

All of the code you need to write for this project goes in either alignment.py or blend.py. The places you need to write code are marked with TODO tags.

Aligning the images

First you’ll write a sequence of functions to compute alignment transformations between images. You’ll work in alignment.py, and you’ll edit the functions alignPair, getInliers, computeHomography, and leastSquaresFit.

The first two TODOs are about the computeHomography function. It takes two feature sets from image 1 and image 2 (f1 and f2) and a list of feature matches (containing pairs of indices into f1 and f2) and estimates a homography from image 1 to image 2.

[TODO 1] Set up the \(A\) matrix that defines to the system \(Ax\) that computes the residuals for a given homography unrolled into a vector \(h\)​. (Refer to the in-class notes!)

[TODO 2] Call minimizeAx on the matrix you set up in TODO 1 and use its result to fill in the 3x3 homography matrix \(H\). The minimizeAx function finds the unit-length vector \(\mathbf{x}\) that minimizes \(||A\mathbf{x}||\) for a given \(A\). Don’t forget to return the homography in its normalized form, with a 1 as the bottom right entry.

[TODO 3] alignPair is where you will implement RANSAC. It takes two feature sets, f1 and f2, the list of feature matches, and a motion model, m (described below) as parameters. For this project, we support two motion models, represented by the two possible values of the enum MotionModel: eTranslate and eHomography. alignPair estimates and returns the inter-image transform matrix \(M\) as follows:

  1. Randomly choose a minimal set of feature matches (one match for the case of translations, four for homographies)
  2. Estimate the corresponding motion model (alignment)
  3. Invoke getInliers to get the indices of inlier feature matches (i.e., indices into matches) that agree with the current motion estimate.

After repeated trials, the entire inlier set from the \(M\) with the largest number of inliers is used to compute a final least squares estimate for the motion, which is returned as the matrix M.

[TODO 4] getInliers computes the indices of the matches that have a Euclidean distance below RANSACthresh given features f1 andf2 from image 1 and image 2 and an inter-image transformation matrix from image 1 to image 2.

[TODO 5, 6] leastSquaresFit computes a least squares estimate for the translation or homography using all of the matches previously estimated as inliers. It returns the resulting translation or homography output transform M. For translation estimation, I recommend simply averaging the translations rather than taking the heavy-handed linear algebra approach. For homographies, you’ve already implemented computeHomography to do the heavy lifting.

Blending the Images

Next you’ll warp and blend the aligned image pairs into a single output image to create the final panorama. All of the TODOs are in blend.py, in the functions imageBoundingBox, blendImages, accumulateBlend, and normalizeBlend.

[TODO 7] imageBoundingBox: Given an image and a homography, figure out the box bounding the image after applying the homography.

[TODO 8] getAccSize: Given the warped images and their relative displacements, figure out how large the final stitched image needs to be in order to fit all the warped image. This method also augments each per-image transformation with a translation that moves the output image coordinate system into a numpy-array-friendly world where (0, 0) is at the top left.

[TODO 9] accumulateBlend: Warp each image into the output image’s coordinate system and add its pixel content into the accumulator. You will need to use inverse warping to calculate values at integer output pixel coordinates. To allow the images to blend smoothly, use the fourth channel to represent the weight of the contribution of a pixel. Using the linear blending scheme described in lecture, the weight varies linearly from 0 to 1 from the left side of the image over a distance of blendWidth pixels, then ramps down correspondingly on the right side of the image. Other, fancier blending schemes are possible.

This TODO is really long, so here are some tips:

  1. When working with homogeneous coordinates, don’t forget to normalize when converting them back to Cartesian coordinates.
  2. Watch out for black pixels in the source image when inverse warping. You don’t want to include these in the accumulation.
  3. When doing inverse warping, use bilinear interpolation for the source image pixels. First try to work out the code by looping over each pixel. Later you can optimize your code using array instructions and numpy tricks to be much faster. My approach does vectorized bilinear interpolation using array operations. You may find numpy.meshgrid useful. Optimizing this function is worth only a couple points, so prioritize this lowest.

[TODO 10] normalizeBlend: Having accumulated weighted pixels from all the source images, this function normalizes the image so each pixel has unit weight by dividing by the weight at each pixel. Be careful not to divide by zero. Remember to make sure the alpha (fourth) channel of the resulting panorama is opaque (1)!

Using the GUI

The skeleton code that we provide comes with a graphical interface, with the module gui.py, which makes it easy for you to do the following:

  1. Visualize a Homography: The first tab in the UI provides you a way to load an image and apply an arbitrary homography to the image. This can be useful while debugging when, for example, you want to visualize the results of both manually and programmatically generated transformation matrices. This tab does not call any of the code that you write: it is purely a reference to help you debug your project.
  2. Align Images: The second tab lets you select two images with overlap and uses RANSAC to compute a homography or translation that maps the right image onto the left image. (Note that this tab does not call your blending code.)
  3. Generating a Panorama: The third tab in the UI lets you generate a panorama. You will need to specify a folder with images labelled in such an order that sorting them alphabetically gives you the order the images appear on the panorama from left to right (or from right to left). This ensures that the mappings between all neighboring pairs are computed. Our current code assumes that all images in the panorama have the same width. The translational motion model should look fairly bad on most datasets.

Debugging Guidelines

You can use the GUI visualizations to check whether your program is running correctly.

  1. Testing the alignment routines:

    The yosemite images are suitable for both motion models (translation and homography). To test alignPair, load two images in the alignment tab of the GUI. Clicking ‘Align Images’ displays a pair, the left and right images, with the right image transformed according to the inter-image transformation matrix and overlaid over the left image. This enables visually analyzing the accuracy of the transformation matrix. Note that blending is not performed at this stage.

    Try both the translation and homography motion models on the yosemite images. You should notice that the translation model produces visible misalignment (the images don’t quite line up), while the homography model produces a much better alignment. This is because the translation model isn’t flexible enough to describe the true transformation between these images.

  2. Testing the blending routines:

    When debugging your blending routines, you may find it helpful for the sake of efficiency to use the melbourne_small dataset, which is simply a downsampled version of the Melbourne dataset. Example panoramas are included in the yosemite directory. Compare your resulting panorama with these reference images.

Artifact

Make one panorama using your project code running on your own data. Commit it to your git repository as artifact.jpg.

Submission

Submit your git repo to Gradescope when you are done.

Rubric

Your project will be graded based on the quality of the panoramas generated. An approximate point breakdown is given below. Keep in mind that later code depends on earlier code, so partial credit may be hard to assign if something early on is broken. If you’re short on time, optimize for having working code for image alignment with homographies.

Correctness:

Efficiency:

Artifact:

Clarity: Deductions for poor coding style may be made. Up to two points may be deducted for each of the following:

Acknowledgments

Many thanks are due to those who developed and refined prior versions of this assignment, including Scott Wehrwein, Steve Seitz, Kavita Bala, Noah Snavely, and many underappreciated TAs.