Project 1

Naive Approach

The goal of this assignment is to take the digitized Prokudin-Gorskii glass plate images and, using image processing techniques like image pyramid, automatically produce a color image with as few visual artifacts as possible. In this assignment, I will first try to align red channel and green channel image with blue channel spearately and stack them back to create color image.

Naive Image Alignment Approach:

  1. Metric: I use Normalized Cross-Correlation (NCC) as the alignment metric, since it is more robust to brightness differences across color channels. NCC is simply a dot product between two normalized vectors: (image1./||image1|| and image2./||image2||).
  2. Search Window: I define a fixed window size of (30, 30). This means searching shifts in the range of -30 to +30 pixels along both the x and y axes.
  3. Preprocessing: To reduce border artifacts and improve alignment accuracy, I crop 20 pixels from each edge of the input images.
  4. Exhaustive Search: I perform a nested loop over all possible shifts within the window:
    • The outer loop iterates over vertical (y-axis) shifts.
    • The inner loop iterates over horizontal (x-axis) shifts.
  5. Cropping: To improve alignment, I manually crop 20 line of on each side of image so we can align without the noise from the border.
  6. Overlap Handling: For each candidate shift, I roll the image using np.roll, compute the overlapping region between the two channels, and then calculate the NCC metric on this overlap to minimize noise.
  7. Optimal Shift Application: After identifying the best shift within the search window, I apply it to the target channel. For example, when aligning the red channel to the blue channel, the computed shift is applied directly to the red channel.
  8. Repeat for All Channels: I repeat the same procedure to align the green channel with the blue channel.
  9. Image Reconstruction: Once both red and green channels are aligned, I combine the three channels using np.dstack. Finally, I convert the result to JPEG-compatible format with img_as_ubyte and save the reconstructed color image.

Multi-Scale Alignment with a Gaussian Image Pyramid

For large .tif files, I speed up alignment using a Gaussian image pyramid. I align the red and green channels to the blue channel independently.

  1. Build the pyramid. I apply a (5×5) Gaussian kernel at each level (implemented with np.lib.stride_tricks.sliding_window_view and np.einsum) to blur the image, then downsample by a factor of 2 along each axis to create the next level. The number of levels is controlled by a level parameter. To avoid excessive shrinking, I stop building the pyramid if either image dimension is ≤ 100 pixels.
  2. Search window. I use a fixed window of (20, 20), i.e., integer shifts from −20 to +20 in both axes.
  3. Exhaustive search at each level. For a given level, I loop over all possible shifts within the window:
    • Outer loop: vertical (y-axis) shifts
    • Inner loop: horizontal (x-axis) shifts
    During metric calculation, I compute only the overlapping region of the two images. To reduce noise, I also crop borders by ~6% on each side to avoid edge artifacts: top/bottom by 0.06 × H, left/right by 0.06 × W.

    To deal with small image like jpg doesn't crop enough, I set fixed size 65 pixels to crop on each side if the percentage caculation is less than 65

  4. Coarsest-to-finest refinement. I recurse down to the coarsest level, find the best shift there, then move one level finer, pre-shifting by the previous best shift and refining with a smaller window (e.g., (5, 5)). I repeat this until I reach the original resolution.
  5. Scoring metric. At every level I use Normalized Cross-Correlation (NCC) to score candidate shifts (higher is better), evaluating all shifts via a nested loop.

This coarse-to-fine strategy finds a good global shift at low resolution and then fine-tunes it as the resolution increases.

Performance Notes & Tweaks

  • Overlap-only scoring: Computing NCC on just the overlapping area reduces cost and avoids border padding effects.

Failure Analysis

From above image pyramid aligned iamge, you can see emir.tif aligned poorly. The major reason for that during I alignment I crop too much image and cause it lost too much information during coraest level and thus lead to bad alignment.

align image
best shift (y,x) is ag: (3, 2), ar: (10, 3)
align image
best shift (y,x) is ag: (49, 23), ar: (102, 40)

By cropping less on the imge,I can achieve almost perfect alignment on emir.tif with parameter:

ag, ag_shift = align((20, 20), b, g, 6, metric_func=ncc, overlap_views=True, default_window=(5,5), percent=0.024)

ar_shift = align((20, 20), b, r, 6, metric_func=ncc, overlap_views=True, default_window=(5,5), percent=0.024)

which means cropping only 2.4% on each side.

align image
align image
emir best shift (y,x) is ag: (49, 23), ar: (102, 40)