Presentation and Lab Briefing
Autonomous Race Car: Visual Servoing
Diego Contreras, Kevin Huang, Nathaniel Morgan, Weiming Zhou, Soe Wai Yan
Overview
Visual servoing enables real-time cone parking and line following on an autonomous racecar. The goal of this lab was to park in front of a cone and follow a line using camera-based perception.
The system is composed of four modules:
- Detect orange cone by color segmentation
- Object detection algorithms (SIFT, Template Matching, YOLO)
- Pixel-to-real-world coordinate transformation via homography
- Steering and stopping at a target distance (parking controller)
These modules are synthesized together for a line following application.
Module 1: Cone Detection via Color Segmentation
Color segmentation can effectively detect orange cones. The pipeline processes the camera image through the following steps:
- Original image is captured from the ZED camera
- Gaussian Blur is applied to reduce noise
- Mask is generated by filtering for orange color in HSV space
- Bounding Box is drawn around the largest detected contour
Our color segmentation achieves a median 0.79 IOU with IQR = 0.18 on the cone dataset. Ground truth bounding boxes (green) were compared against predicted bounding boxes (red). Test 1 achieved an IOU score of 0.97, while test 7 (with a smaller, more distant cone) achieved an IOU score of 0.63.
Module 2: Object Detection Algorithms
Part 1: SIFT & Template Matching
Two classical object detection algorithms were evaluated: SIFT (Scale-Invariant Feature Transform) and Template Matching.
SIFT Detection Results
SIFT was tested on two datasets:
- Citgo dataset — Works well. SIFT successfully matched features across different views of the Citgo sign with high IOU scores (up to 0.91).
- Stata Map dataset — Works poorly. SIFT failed to find enough matches on the map images, returning 0.0 IOU across all test images.
Template Matching Results
Template matching was tested on the Stata Map dataset, where it performed well with IOU scores ranging from 0.48 to 0.91.
| Method | Best Use Case |
|---|---|
| SIFT | Landmark Localization |
| Template Matching | Map Localization |
Part 2: YOLO Object Detection
YOLO detects objects on the live ZED camera feed with tunable confidence and IOU thresholds. We experimented with different threshold values:
- Confidence threshold = 0.2: More detections but with lower precision
- Confidence threshold = 0.9: Fewer detections but higher precision
- IOU threshold = 0.9: More overlapping boxes retained
- IOU threshold = 0.2: Aggressive non-maximum suppression removes overlapping boxes
Module 3: Pixel-to-Plane via Homography
Homography transforms pixel coordinates to robot frame coordinates with a mean error of 1.5 cm (standard deviation: 1.7 cm). The error is mainly in the forward direction.
Calibration
Manual calibration was performed using rqt_image_view to collect pixel-to-ground correspondences. The cone tip was identified in the camera image and its corresponding real-world position was measured.
Homography Computation
Four calibration point pairs were used to compute the homography matrix via cv2.findHomography():
| Point | Pixel (u, v) | Ground (x cm, y cm) |
|---|---|---|
| 1 | (211, 162) | (30.48, 7.62) |
| 2 | (415, 154) | (46.99, -12.70) |
| 3 | (351, 145) | (109.22, -13.97) |
| 4 | (402, 167) | (22.86, -6.35) |
The homography matrix $H$ maps pixel coordinates $(u, v)$ to ground-plane coordinates $(x, y)$ via the relation:
$$s \begin{bmatrix} x \\ y \\ 1 \end{bmatrix} = \begin{bmatrix} h_{11} & h_{12} & h_{13} \\ h_{21} & h_{22} & h_{23} \\ h_{31} & h_{32} & 1 \end{bmatrix} \begin{bmatrix} u \\ v \\ 1 \end{bmatrix}$$
Module 4: Parking Controller
Our parking controller converges to the target distance across multiple trials.
Controller Design
- If cone is far: drive closer
- If cone is too close: reverse further
- If cone is off-center: steer to align
- Desired parking distance: 0.75 meters
- Parameters: $K_p = 1.0$, $K_d = 0.1$
Performance
Over $N = 9$ trials, the controller achieved:
- Final $x_{\text{error}}$: 0.75 m
- Final distance error: 0 m
Simulation Tests
The parking controller was evaluated in simulation across three scenarios:
Sim Test 1: Cone in Front
When the cone is placed directly in front of the robot, the controller drives forward and converges to the desired parking distance. The distance error and y-error converge to zero, while the x-error settles to the target distance of 0.75 m.
Sim Test 2: Cone to the Side
When the cone is placed to the side, the controller first steers to align with the cone and then drives to the target distance. The y-error gradually decreases as the robot aligns itself, and the distance error converges to the parking distance.
Sim Test 3: Cone Behind
When the cone is placed behind the robot, the controller reverses and maneuvers to face the cone, then drives forward to park at the target distance. This scenario shows the largest initial transient as the robot must execute a more complex trajectory.
Robot Issues
During real-robot testing, we encountered hardware issues including connectivity problems with the racecar's onboard computer (SSH via WiFi) and a damaged USB cable for the ZED camera. These issues prevented deployment of the parking controller and line following on the physical robot.
Conclusion
Visual servoing enables reliable cone parking in simulation. The key results are:
- Color segmentation achieved median 0.79 IOU on the cone dataset
- Homography achieved 1.5 cm mean error for pixel-to-ground transformation
- Parking controller converged to 0 m distance error in simulation
Lab Goals Status
| Goal | Status |
|---|---|
| Object detection algorithms implemented and evaluated | Complete |
| Homography computed and validated with error metric | Complete |
| Parking controller deployed on real robot | Not Complete |
| Line following demonstrated on track | Not Complete |
Future Work: Deploy the parking controller and line following system on the physical robot once hardware issues are resolved.
Citations
- OpenCV Documentation. cv2.findHomography. https://docs.opencv.org/4.x/d9/d0c/group__calib3d.html
- Ultralytics. YOLOv8 Documentation. https://docs.ultralytics.com
- MIT RSS Lab 4: Visual Servoing. https://github.com/mit-rss/visual_servoing
- A. Shwaiheen, "Line Follower Robot - Very Fast Using Port Manipulation," Hackster.io, Jan. 3, 2020.