Augmented Reality — Lab — Marker Tracking (OpenCV)


Marker Tracking

Base Code

Requirements:

pip install opencv-python

pip install numpy

Introduction

The goal of this lab session is to introduce you to several fundamental concepts in computer vision through a concrete application: detecting and localizing a planar marker in a video stream.

You will work on:

  • detecting feature points (interest points),
  • describing these points (local descriptors),
  • matching points detected in two images,
  • computing a homography (projective transformation),
  • augmenting a video stream based on marker localization.

By the end of the lab, you will have a program that, given:

  • a reference image (the marker),
  • a video stream (webcam or video file),

automatically detects the presence of the marker in the scene and masks the marker area by replacing it with a white polygon (to visually validate detection).


Application Principle

The overall idea of the system is the following:

  1. Load the marker image (reference image).

  2. Open a video stream (webcam or video file).

  3. Detect a set of feature points in the marker image (FP_M).

  4. Compute descriptors for these points (FD_M).

  5. For each frame I_t of the video stream:

    • detect a set of feature points (FP_t),
    • compute descriptors (FD_t),
    • match FD_M and FD_t to obtain correspondences,
    • filter these correspondences to retain only “good matches”.
  6. Estimate the geometric transformation relating the marker plane to the frame: a homography H_{M->t} robustly computed using RANSAC.

  7. Use H_{M->t} to:

    • project the 4 corners of the marker into image I_t,
    • mask the detected region by filling it in white.

Important note: this lab performs localization via “re-detection + matching” at each frame. It is not a temporal tracking approach (e.g., KLT/optical flow), but rather a robust and simple method to implement.


Feature Detection

The objective of feature detection (or more generally “interest regions”) is to automatically select image elements that exhibit distinctive properties: corners, textured regions, high-contrast areas, etc.

A feature detector takes an image as input and outputs a set of pixel coordinates corresponding to points considered “interesting” by the algorithm.

Important: a detector provides positions (and sometimes scale/orientation) but not a descriptive signature.

Examples of detectors: Harris, FAST, Shi-Tomasi, ORB, AKAZE, SIFT, etc.

Resources (to read after the lab):


Feature Description

A feature descriptor aims to numerically characterize the local appearance around an interest point.

It takes as input:

  • an image,
  • a list of feature points,

and outputs a vector (or matrix) of descriptors, one per point.

These descriptors represent a local “fingerprint” that enables comparison of points detected in different images.

Desirable properties of a descriptor:

  • invariance (or robustness) to rotation,
  • invariance (or robustness) to scale changes,
  • robustness to photometric variations (lighting),
  • robustness to moderate geometric transformations.

Examples: ORB, AKAZE, BRISK, SIFT (detector + descriptor), etc.

Resources (to read after the lab):


Matching

Matching addresses the following question:

Given a feature in the marker image, how can we find the most similar feature in the current video frame?

The general idea is:

  1. Define a distance or similarity measure between descriptors.
  2. For each descriptor of the marker, find the closest descriptor in the frame.

Possible measures: L1, L2, Hamming (for binary descriptors), ratio test, etc.

Matching methods:

  • Brute force: compares all descriptors pairwise (simple, potentially costly).
  • FLANN: approximate nearest neighbors (faster for large datasets).

After matching, a filtering rule must be defined to retain only relevant correspondences. A classical approach (used in the provided base code) consists of using a threshold based on the minimum observed distance:

threshold = alpha * minDist

where alpha is a coefficient (typically between 3 and 20 depending on the method and expected quality).

Resources (to read after the lab):


Homography and Projection

Once correspondences between the marker (planar object) and the frame (image) have been established, we estimate the geometric transformation relating these two sets of points.

For a planar object, this transformation is a homography (3x3 matrix). It models perspective effects and relates two views of the same plane.

We estimate a matrix H from the good matches (after filtering). The estimation must be robust to outliers: we use RANSAC.

Resources (to read after the lab):


Implementation in OpenCV

The objective of the lab is to implement the previous steps in OpenCV starting from a provided code skeleton.

Provided Base Code

You are given the following files:

  • main.py
  • MyFeatureDetector.py
  • MyDescriptorExtractor.py
  • MyDescriptorMatcher.py

These files contain TODO sections guiding your implementation.

Important: the current base code loads two static images. You must adapt it to handle:

  • a reference image marker.jpg,
  • a video stream (webcam or file).

You will remove the final warped image generation part and replace it with an augmentation step (masking the marker) based on projected corners.


Questions

Question 1: Reading and Understanding the Provided Code

Read the provided code and identify:

  • where images are loaded,
  • where the detector, extractor, and matcher are instantiated,
  • where detected points and descriptors are stored,
  • how the list of best matches is built,
  • how the homography is computed.

Explain in a few lines the role of each file.


Question 2: Create a Feature Detector (ORB)

In MyFeatureDetector.py, complete the changeFeatureDetector function to create an ORB detector.

Objective: fill the myFeatureDetector attribute by calling the appropriate OpenCV constructor (e.g., cv.ORB_create(...)).


Question 3: Display Detected Points

In main.py, display detected feature points:

  • on the image marker.jpg,
  • on a video frame.

Hint: use the displayFeatures method and the existing TODO sections in the skeleton.

To do: add a screenshot showing results on the marker and on a frame.

Optional: add parameters to cv.ORB_create(...) (number of points, threshold, etc.) and comment on their impact.


Question 4: Instantiate a Descriptor Extractor (ORB)

In MyDescriptorExtractor.py, complete changeDescriptorExtractor to create an ORB descriptor extractor.


Question 5: Compute Descriptors

Complete computeDescriptors() in MyDescriptorExtractor.py and complete the corresponding calls in main.py to:

  • compute descriptors for the marker,
  • compute descriptors for a frame.

Verify that descriptors are properly computed (consistent dimensions, non-empty).


Question 6: Perform Matching

In MyDescriptorMatcher.py, complete the match method to retain only correspondences whose distance is below a threshold:

threshold = alpha * minDist

Selected matches must be stored in bestMatches.


Question 7: Display Matching Results

In main.py, call drawMatchingResults() and display the resulting image.

To do: illustrate this result with a screenshot in your report.


Question 8: Understand and Compute the Homography

From the best matches, build two lists of 2D points:

  • points in the marker image,
  • points in the video frame.

Then compute the homography using cv.findHomography(..., cv.RANSAC, epsilon).

Explain briefly what this computation does and why RANSAC is necessary.


Question 9: Augmentation — Mask the Marker

Once homography H is estimated, project the 4 marker corners into the frame.

Expected steps:

  1. Define the 4 marker corners in its image coordinate system.
  2. Project them using cv.perspectiveTransform.
  3. Draw the contour (recommended for debugging).
  4. Fill the projected polygon in white using cv.fillConvexPoly.

Condition: perform augmentation only if H is valid and if the number of inliers is sufficient (threshold to define and justify).

To do: provide a screenshot where the marker is effectively masked.


Question 10: Test on Other Sequences

Test your program with:

  • another video (or another webcam scene),
  • another marker (different reference image).

To do: illustrate at least one additional test and comment on robustness.


Question 11: Change Detector/Descriptor (Optional)

Try another detector/descriptor combination (e.g., AKAZE) and compare:

  • number of detected points,
  • matching stability,
  • number of inliers,
  • robustness to scale/lighting variations,
  • computational cost.

To do: illustrate and comment.


Deliverables

You must submit:

  1. The completed code.

  2. A short report (PDF or Markdown) containing:

    • the requested screenshots,
    • your parameter choices (alpha, inlier threshold, etc.),
    • a robustness analysis (tests, failure cases).