
You may work on this assignment solo or in groups of two. If you would like to work in a pair, you need to complete the following steps:
In this project, you will implement a system to combine a series of
horizontally overlapping photographs into a single panoramic image.
We’ll use the built-in ORB feature detector and descriptor from the
opencv library. Given the feature correspondences, you will
automatically align the photographs (determine their overlap and
relative positions) using RANSAC to find an outlier-robust motion model
and then blend the resulting images into a single seamless panorama.
You are provided with a GUI that lets you test and visualize the functionality and intermediate results of the various stages of the pipeline that ultimately produces the final panorama output. We have also provided you with some test images and unit tests to help you debug.
The high-level steps required to create a panorama are listed below. You will implement panorama stitching using translation and homography motion models.
Take a sequence of photos with horizontal overlap
Extract features from each image
Match features among neighboring pairs of images
Align neighboring pairs using RANSAC
Write out list of transformations that relate each image to a single coordinate system
Warp the images into the output panorama and blend them together
Crop the panorama and admire the beautiful result
Skeleton code is provided in the repository created by Github Classroom. The invitation link is on Moodle.
Test sets: See the resources
subdirectory in your repo. You will find three datasets:
yosemite, melbourne, and
melbourne_small.
Software environment: See the Environment Setup page for detailed installation
instructions for Linux, macOS, and Windows. You’ll need Python 3.10+
with numpy, opencv-python, and
pillow. The GUI uses Tkinter, which comes with most Python
installations.
All of the code you need to write for this project goes in either
alignment.py or blend.py. The places you need
to write code are marked with TODO tags.
First you’ll write a sequence of functions to compute alignment
transformations between images. You’ll work in
alignment.py, and you’ll edit the functions
alignPair, getInliers,
computeHomography, and leastSquaresFit.
The first two TODOs are about the computeHomography
function. It takes two feature sets from image 1 and image 2
(f1 and f2) and a list of feature matches
(containing pairs of indices into f1 and f2)
and estimates a homography from image 1 to image 2.
[TODO 1] Set up the \(A\) matrix that defines to the system \(Ax\) that computes the residuals for a given homography unrolled into a vector \(h\). (Refer to the in-class notes!)
[TODO 2] Call minimizeAx on the matrix
you set up in TODO 1 and use its result to fill in the 3x3 homography
matrix \(H\). The
minimizeAx function finds the unit-length vector \(\mathbf{x}\) that minimizes \(||A\mathbf{x}||\) for a given \(A\). Don’t forget to return the homography
in its normalized form, with a 1 as the bottom right entry.
[TODO 3] alignPair is where you will
implement RANSAC. It takes two feature sets, f1 and
f2, the list of feature matches, and a motion
model, m (described below) as parameters. For this
project, we support two motion models, represented by the two possible
values of the enum MotionModel: eTranslate and
eHomography. alignPair estimates and returns
the inter-image transform matrix \(M\)
as follows:
getInliers to get the indices of inlier feature
matches (i.e., indices into matches) that agree with the
current motion estimate.After repeated trials, the entire inlier set from the \(M\) with the largest number of inliers is used to compute a final least squares estimate for the motion, which is returned as the matrix M.
[TODO 4] getInliers computes the
indices of the matches that have a Euclidean distance below
RANSACthresh given features f1
andf2 from image 1 and image 2 and an inter-image
transformation matrix from image 1 to image 2.
[TODO 5, 6] leastSquaresFit computes a
least squares estimate for the translation or homography using all of
the matches previously estimated as inliers. It returns the resulting
translation or homography output transform M. For
translation estimation, I recommend simply averaging the translations
rather than taking the heavy-handed linear algebra approach. For
homographies, you’ve already implemented computeHomography
to do the heavy lifting.
Next you’ll warp and blend the aligned image pairs into a single
output image to create the final panorama. All of the TODOs are in
blend.py, in the functions imageBoundingBox,
blendImages, accumulateBlend, and
normalizeBlend.
[TODO 7] imageBoundingBox: Given an
image and a homography, figure out the box bounding the image after
applying the homography.
[TODO 8] getAccSize: Given the warped
images and their relative displacements, figure out how large the final
stitched image needs to be in order to fit all the warped image. This
method also augments each per-image transformation with a translation
that moves the output image coordinate system into a
numpy-array-friendly world where (0, 0) is at the top left.
[TODO 9] accumulateBlend: Warp each
image into the output image’s coordinate system and add its pixel
content into the accumulator. You will need to use inverse warping to
calculate values at integer output pixel coordinates. To allow the
images to blend smoothly, use the fourth channel to represent the weight
of the contribution of a pixel. Using the linear blending scheme
described in lecture, the weight varies linearly from 0 to 1 from the
left side of the image over a distance of blendWidth
pixels, then ramps down correspondingly on the right side of the image.
Other, fancier blending schemes are possible.
This TODO is really long, so here are some tips:
numpy.meshgrid useful. Optimizing this function is worth
only a couple points, so prioritize this lowest.[TODO 10] normalizeBlend: Having
accumulated weighted pixels from all the source images, this function
normalizes the image so each pixel has unit weight by dividing by the
weight at each pixel. Be careful not to divide by zero. Remember to make
sure the alpha (fourth) channel of the resulting panorama is opaque
(1)!
The skeleton code that we provide comes with a graphical interface,
with the module gui.py, which makes it easy for you to do
the following:
You can use the GUI visualizations to check whether your program is running correctly.
Testing the alignment routines:
The yosemite images are suitable for both motion models (translation
and homography). To test alignPair, load two images in the
alignment tab of the GUI. Clicking ‘Align Images’ displays a pair, the
left and right images, with the right image transformed according to the
inter-image transformation matrix and overlaid over the left image. This
enables visually analyzing the accuracy of the transformation matrix.
Note that blending is not performed at this stage.
Try both the translation and homography motion models on the yosemite images. You should notice that the translation model produces visible misalignment (the images don’t quite line up), while the homography model produces a much better alignment. This is because the translation model isn’t flexible enough to describe the true transformation between these images.
Testing the blending routines:
When debugging your blending routines, you may find it helpful for the sake of efficiency to use the melbourne_small dataset, which is simply a downsampled version of the Melbourne dataset. Example panoramas are included in the yosemite directory. Compare your resulting panorama with these reference images.
Make one panorama using your project code running on your own data.
Commit it to your git repository as artifact.jpg.
Submit your git repo to Gradescope when you are done.
Your project will be graded based on the quality of the panoramas generated. An approximate point breakdown is given below. Keep in mind that later code depends on earlier code, so partial credit may be hard to assign if something early on is broken. If you’re short on time, optimize for having working code for image alignment with homographies.
Correctness:
Efficiency:
Artifact:
Clarity: Deductions for poor coding style may be made. Up to two points may be deducted for each of the following:
Many thanks are due to those who developed and refined prior versions of this assignment, including Scott Wehrwein, Steve Seitz, Kavita Bala, Noah Snavely, and many underappreciated TAs.