Home Syllabus Schedule AI Prompt Resources

Homework 2 Solutions

CSSE 461 - Computer Vision

Problem 1: Parse Camera Matrices

Focal length

Extract the \(f_x\) and \(f_y\) elements of \(\mathbf{K}\). Each one yields an estimate for \(f\), so compute both ways and average.

# specs about a Nikon D500:
# sensor:
w_mm = 23.5
h_mm = 15.7
# image size:
w_px = 5568
h_px = 3712

# parse K: 
fx = K[0,0]
fy = K[1,1]
# x and y focal lengths both estimate f. So average them:
f = (fx * w_mm / w_px + fy * h_mm / h_px) / 2
print('Estimated f:', f, 'mm')

Output:

Estimated f: 50.0 mm

Camera location

# camera center:
rot = E[:,0:3]
t = E[:,3].reshape((3,1)) # numpy is picky about column vecs

cam_center_w_est = -rot.T @ t
print('Estimated cam center (world coords):', cam_center_w_est.flatten())

Output:

Estimated cam center (world coords): [ 8.21 -0.64  1.42]

Camera viewing direction

The most elegant answer I know is to take the viewing direction in camera coordinates and rotate that into world coordinates. Our \(\mathbf{R}\) maps world-to-camera, so we need the inverse rotation, \(\mathbf{R}^\top\).

# camera viewing direction:
z_cam = rot.T @ np.array([[0], [0], [1]])
print('Estimated camera viewing direction (world coords):', z_cam.flatten())

Output:

Estimated camera viewing direction (world coords): [ 0.93934743 -0.30521248  0.15643447]

Problem 2: Reprojection error

Start with a function to perform pinhole projection:

def project(E, K, Pw):
    """
    Projects a 3D point Pw (3x1) into the image using extrinsics E (3x4) and intrinsics K (3x3).
    Returns the projected 2D points in pixel coordinates (2x1).
    """
    Pw_hom = np.vstack((Pw, 1))  # Convert to homogeneous coordinates
    Pc = E @ Pw_hom  # Camera coordinates
    p_homog = K @ Pc  # Image coordinates in homogeneous form
    p = p_homog[:2] / p_homog[2]  # Convert to pixel coordinates
    return p

Now project the 3D point into each image. Compute errors between the reprojection and the actual observations:

err1 = px_cam1 - project(E1, K, Pw)
err2 = px_cam2 - project(E2, K, Pw)

One way to find the length of a vector is np.linalg.norm. Add up the squared length of the errors:

reprojection_error = np.square(np.linalg.norm(err1)) + np.square(np.linalg.norm(err2))
print("Squared reprojection error:", reprojection_error)

My output was:

Squared reprojection error: 12.01971237805994