Project : Visual Odometry Report Summary

Visual Odometry Overview

Introduction

Visual Odometry (VO) is a technique used to estimate a robot’s position and orientation by analyzing the motion of visual features captured by a camera over time. VO plays a crucial role in autonomous navigation systems, enabling robots to understand their environment without relying on external references like GPS.

For this project, a complete pipeline of Visual odometry was created and implemented on the Kitti Dataset which ius used for benchmarking tasks like Visual odometry, 3D object Detection and scene understanding. The code was developed in Python using primarily basic packages and OpenCV. While Python is typically not used for real-time applications due to its slower processing capabilities, it excels in visualization and offers a vast array of powerful libraries.

Methods

The visual odometry pipeline was performed on the grayscale images from the dataset. The entire pipeline consisted of following steps

Disparity Map

Taking images from both the left and right camera and create a complete image with features being enhanced. There are two methods used, one is Stereo BM (Block Model) and Stereo SGBM (Semi-Global Block Model). These models are used to convert the images from a stereo setup to a disparity map.

BM Disparity Map

SGBM Disparity Map

Depth Map

One of the important feature which needs to be recovered is the depth. This is done using the stereo camera setup and the disparity map generated. For each pixel, the depth is calculate using the formula Z = fb/d.

Here Z = Depth, f = focal distance of the camera, b = baseline distance between two cameras and d = disparity between two pixels. This formula is generated using simple geometry from the image below.

This formula implemented on both the types of disparity maps to generate a depth map. The center of the image can be seen to have the least disparity but higher depth information.

BM Depth Map

SGBM Depth Map

Feature Extraction

The information from the depth map and the disparity map is used to extract important features. These features are supposed to be change invariant where they are matched with the features extracted from the next frame.

The two ways I implemented this was using SIFT and ORB feature detection. More information about them could be found in the attached report.

SIFT Feature Detection

Orb Feature Detection

Feature Matching

From the features that are detected, we pick two frames one after the other and match the features. This matching is then used to estimate the change in position.

SIFT Feature Matching

Orb Feature Matching

Results

For my results, the SIFT matching worked really well while the ORB error keeps on accumulating. The results are as follows.

	SIFT (Dataset 01)	ORB (Dataset 01)
MAE	911.82	2891.26
Time	3m 0.32 s	1m 37.23 s
Time/Frame	0.167 s	0.08831 s

The final pictures are below:

SIFT

Orb

Challenges

Several challenges are associated with Visual Odometry:

Scale Ambiguity: Without additional sensors, VO struggles with absolute scale, making it difficult to distinguish between a small object close to the camera and a large object far away.
Illumination Changes: Variations in lighting can impact feature detection and tracking.
Motion Blur: Fast movement can cause blur, leading to inaccurate feature detection.

Possible Improvements

Integrate the Lidar data with a filter to estimate the pose better
Use Mapping techniques to improve trajectory estimate

Applications

Visual Odometry is applied in various domains, including:

Autonomous Vehicles: Used for navigation in environments where GPS signals are unreliable.
Robotics: Enables mobile robots to navigate complex environments.
Augmented Reality: Helps align virtual objects with the real world by tracking the user’s movement.

Conclusion

While Visual Odometry is a powerful tool for estimating motion, it is often complemented by other sensors, such as IMUs or LiDAR, to address its limitations. Integrating Visual Odometry with other SLAM (Simultaneous Localization and Mapping) techniques enhances robustness, especially in dynamic and unstructured environments.

Acknowledgments

Worked With : Graham Clifford
Kitti Dataset
Inspiration - Visual Odometry for Beginners

Links

Github - Computer Vision Project Report - Visual Odometry Report