Augmented Reality — Lab — Marker Tracking (OpenCV)

Marker Tracking

Requirements:

pip install opencv-python

pip install numpy

Introduction

The goal of this lab session is to introduce you to several fundamental concepts in computer vision through a concrete application: detecting and localizing a planar marker in a video stream.

You will work on:

detecting feature points (interest points),
describing these points (local descriptors),
matching points detected in two images,
computing a homography (projective transformation),
augmenting a video stream based on marker localization.

By the end of the lab, you will have a program that, given:

a reference image (the marker),
a video stream (webcam or video file),

automatically detects the presence of the marker in the scene and masks the marker area by replacing it with a white polygon (to visually validate detection).

Application Principle

The overall idea of the system is the following:

Load the marker image (reference image).
Open a video stream (webcam or video file).
Detect a set of feature points in the marker image (FP_M).
Compute descriptors for these points (FD_M).
For each frame I_t of the video stream:
- detect a set of feature points (FP_t),
- compute descriptors (FD_t),
- match FD_M and FD_t to obtain correspondences,
- filter these correspondences to retain only “good matches”.
Estimate the geometric transformation relating the marker plane to the frame: a homography H_{M->t} robustly computed using RANSAC.
Use H_{M->t} to:
- project the 4 corners of the marker into image I_t,
- mask the detected region by filling it in white.

Important note: this lab performs localization via “re-detection + matching” at each frame. It is not a temporal tracking approach (e.g., KLT/optical flow), but rather a robust and simple method to implement.

Feature Detection

The objective of feature detection (or more generally “interest regions”) is to automatically select image elements that exhibit distinctive properties: corners, textured regions, high-contrast areas, etc.

A feature detector takes an image as input and outputs a set of pixel coordinates corresponding to points considered “interesting” by the algorithm.

Important: a detector provides positions (and sometimes scale/orientation) but not a descriptive signature.

Examples of detectors: Harris, FAST, Shi-Tomasi, ORB, AKAZE, SIFT, etc.

Resources (to read after the lab):

Feature Description

A feature descriptor aims to numerically characterize the local appearance around an interest point.

It takes as input:

an image,
a list of feature points,

and outputs a vector (or matrix) of descriptors, one per point.

These descriptors represent a local “fingerprint” that enables comparison of points detected in different images.

Desirable properties of a descriptor:

invariance (or robustness) to rotation,
invariance (or robustness) to scale changes,
robustness to photometric variations (lighting),
robustness to moderate geometric transformations.

Examples: ORB, AKAZE, BRISK, SIFT (detector + descriptor), etc.

Resources (to read after the lab):

Matching

Matching addresses the following question:

Given a feature in the marker image, how can we find the most similar feature in the current video frame?

The general idea is:

Define a distance or similarity measure between descriptors.
For each descriptor of the marker, find the closest descriptor in the frame.

Possible measures: L1, L2, Hamming (for binary descriptors), ratio test, etc.

Matching methods:

Brute force: compares all descriptors pairwise (simple, potentially costly).
FLANN: approximate nearest neighbors (faster for large datasets).

After matching, a filtering rule must be defined to retain only relevant correspondences. A classical approach (used in the provided base code) consists of using a threshold based on the minimum observed distance:

threshold = alpha * minDist

where alpha is a coefficient (typically between 3 and 20 depending on the method and expected quality).

Resources (to read after the lab):

Homography and Projection

Once correspondences between the marker (planar object) and the frame (image) have been established, we estimate the geometric transformation relating these two sets of points.

For a planar object, this transformation is a homography (3x3 matrix). It models perspective effects and relates two views of the same plane.

We estimate a matrix H from the good matches (after filtering). The estimation must be robust to outliers: we use RANSAC.

Resources (to read after the lab):

Implementation in OpenCV

The objective of the lab is to implement the previous steps in OpenCV starting from a provided code skeleton.

Provided Base Code

You are given the following files:

main.py
MyFeatureDetector.py
MyDescriptorExtractor.py
MyDescriptorMatcher.py

These files contain TODO sections guiding your implementation.

Important: the current base code loads two static images. You must adapt it to handle:

a reference image marker.jpg,
a video stream (webcam or file).

You will remove the final warped image generation part and replace it with an augmentation step (masking the marker) based on projected corners.

Questions

Question 1: Reading and Understanding the Provided Code

Read the provided code and identify:

where images are loaded,
where the detector, extractor, and matcher are instantiated,
where detected points and descriptors are stored,
how the list of best matches is built,
how the homography is computed.

Explain in a few lines the role of each file.

Question 2: Create a Feature Detector (ORB)

In MyFeatureDetector.py, complete the changeFeatureDetector function to create an ORB detector.

Objective: fill the myFeatureDetector attribute by calling the appropriate OpenCV constructor (e.g., cv.ORB_create(...)).

Question 3: Display Detected Points

In main.py, display detected feature points:

on the image marker.jpg,
on a video frame.

Hint: use the displayFeatures method and the existing TODO sections in the skeleton.

To do: add a screenshot showing results on the marker and on a frame.

Optional: add parameters to cv.ORB_create(...) (number of points, threshold, etc.) and comment on their impact.

Question 4: Instantiate a Descriptor Extractor (ORB)

In MyDescriptorExtractor.py, complete changeDescriptorExtractor to create an ORB descriptor extractor.

Question 5: Compute Descriptors

Complete computeDescriptors() in MyDescriptorExtractor.py and complete the corresponding calls in main.py to:

compute descriptors for the marker,
compute descriptors for a frame.

Verify that descriptors are properly computed (consistent dimensions, non-empty).

Question 6: Perform Matching

In MyDescriptorMatcher.py, complete the match method to retain only correspondences whose distance is below a threshold:

threshold = alpha * minDist

Selected matches must be stored in bestMatches.

Question 7: Display Matching Results

In main.py, call drawMatchingResults() and display the resulting image.

To do: illustrate this result with a screenshot in your report.

Question 8: Understand and Compute the Homography

From the best matches, build two lists of 2D points:

points in the marker image,
points in the video frame.

Then compute the homography using cv.findHomography(..., cv.RANSAC, epsilon).

Explain briefly what this computation does and why RANSAC is necessary.

Question 9: Augmentation — Mask the Marker

Once homography H is estimated, project the 4 marker corners into the frame.

Expected steps:

Define the 4 marker corners in its image coordinate system.
Project them using cv.perspectiveTransform.
Draw the contour (recommended for debugging).
Fill the projected polygon in white using cv.fillConvexPoly.

Condition: perform augmentation only if H is valid and if the number of inliers is sufficient (threshold to define and justify).

To do: provide a screenshot where the marker is effectively masked.

Question 10: Test on Other Sequences

Test your program with:

another video (or another webcam scene),
another marker (different reference image).

To do: illustrate at least one additional test and comment on robustness.

Question 11: Change Detector/Descriptor (Optional)

Try another detector/descriptor combination (e.g., AKAZE) and compare:

number of detected points,
matching stability,
number of inliers,
robustness to scale/lighting variations,
computational cost.

To do: illustrate and comment.

Deliverables

You must submit:

The completed code.
A short report (PDF or Markdown) containing:
- the requested screenshots,
- your parameter choices (alpha, inlier threshold, etc.),
- a robustness analysis (tests, failure cases).