Autonomous Strawberry Picking Using a ROS2 Mobile Manipulator and YOLOv11-Based Depth-Aware Grasping
🍓 🤖 Background
Small robots (or mobile manipulators) are extremely helpful in settings that are compact or limited with space, such as greenhouses. Often, strawberries grown in greenhouses are harvested manually, and that is exactly where these small robots could make a real difference. Imagine multiple of these working in coordination (like a swarm), moving autonomously through narrow greenhouse rows, identifying only the berries that are ripe, and picking them one by one without disturbing the plant or the fruit around it. No fatigue, no missed berry, no damage!
This work imitated, in a small way, a situation in which ripe strawberries could be detected and picked autonomously and the whole process could be automated. Using YOLOv11 for visual detection (strawberry_pick_ik.launch.py), a depth camera for 3D depth estimation, and inverse kinematics to physically actuate the robotic arm (strawberry_pick_ik.py), the HiWonder LanderPi demonstrated that even a compact, low-cost mobile manipulator running ROS2 Humble inside Docker on a Raspberry Pi 5 can perform a task as nuanced and delicate as selective fruit picking.
The broader implication is not just about strawberries. It is about demonstrating that precision agriculture, the kind that requires selective, gentle, and intelligent interaction with individual plants, no longer has to depend entirely on human hands.
🍓 Overview
This write-up is for those working with a HiWonder LanderPi robot and want to implement autonomous strawberry detection and picking using computer vision and robotic arm control. However, the major focus of this tutorial is targeted towards optimized arm control and an end-effector gripper rather than vision-based detection. If you already know ROS and computer vision, you can clone this repo LanderPi on your RaspberryPi as I have already added all the configuration files along with the trained strawberry model architecture + relevant .py files: strawberry_pick_ik.py and strawberry_pick_ik.launch.py, to jump-start detection and picking.
All you will have to do is to simply move the relevant files into respective docker container folders and perform build in order to add strawberry detection and picking package to the ROS2 setup.py file. This process won’t take more than 15-20 mins provided you know working with Linux-based system. If you want to learn step-by-step process, then start following the steps as mentioned below.
The entire pipeline covers:
- Training a custom YOLOv11 model on ripe/unripe strawberry classes (Very brief!)
- Converting the model to OpenVINO IR format for faster inference on Raspberry Pi (Very brief!)
- Deploying it on LanderPi running ROS2 Humble inside Docker (Detailed!)
- Using a depth camera + inverse kinematics to physically pick ripe strawberries (Detailed!)
- • Basic knowledge of Robot Operating System (ROS) and Computer Vision
- • Working with Linux-based OS
- • Must have followed primary instructions as stated within Robotic Arm Control/LanderPi
- • A virtual machine (VM) with Ubuntu 22.04 (Jammy) to visualize simulations using RViz (Optional)
- • A working knowledge of training vision models ( YOLO in this case) using necessary hyperparameters
- • HiWonder LanderPi with Mecanum chassis [2]
- • Aurora 930 depth camera [2]
- • ROS2 Humble running inside Docker [1]
- • The workspace lives at
/home/ubuntu/ros2_wswithin the container [1] - • All commands must run inside the Docker container unless stated otherwise
📋 Table of Contents
- Hardware and Software Requirements
- Training the Strawberry Detection Model
- Converting the Model for Deployment
- Setting Up the ROS2 Package with Strawberry Pick Launch Files
- How the Picking Pipeline Works?
- The Main Node
- The Launch File
- Running the Pipeline
- Tuning the Arm for Accurate Picking
- Known Issues and Limitations
1. Hardware and Software Requirements
| Component | Details |
|---|---|
| 🤖 Robot | HiWonder LanderPi with Mecanum chassis |
| 📷 Camera | Aurora 930 depth camera (RGB + depth) |
| 💻 Compute | Raspberry Pi 5 inside Docker container |
| 🐧 OS | Ubuntu 22.04, ROS2 Humble |
| 🧠 ML Framework | Ultralytics YOLOv11 + OpenVINO Runtime |
| 🐍 Language | Python 3.10 |
2. Training the Strawberry Detection Model
I trained a custom YOLOv11n (nano for Raspberry Pi) model with two classes: ripe and unripe uisng the dataset from Roboflow. However, within the repo that you’ll clone, I have already provided strawberry.pt file to get started. I trained my own model with custom hyperparameters. With the interest of space within this Blog, I am not going explain in detail on how to train a YOLO-based vision model for detection purpose. If you encounter any issues, please reach out to me or pull an issue.
Your strawberry_data.yaml should look like this (below). Make sure when you are training your own YOLOv11n model, this *.yaml file lives inside the main dataset folder where test, train, and valid folders are present.
train: ../train/images
val: ../valid/images
test: ../test/images
nc: 2
names: ['ripe', 'unripe']
roboflow:
workspace: eric-z0ptd
project: strawberry_picking_2
version: 1
license: CC BY 4.0
url: https://universe.roboflow.com/eric-z0ptd/strawberry_picking_2/dataset/1
from ultralytics import YOLO
model = YOLO('yolo11n.pt') # Start from nano pretrained weights
model.train(
data='strawberry_data.yaml',
epochs=100,
imgsz=640,
batch=16,
name='strawberry_yolo11'
)
3. Converting the Model for Deployment on LanderPi
Once your model is trained, you will see strawberry.pt (or whatever name you trained your model with) is saved inside the custom path. Now this step completely depends on where you trained your model. There are two options: (a) if you trained the model on the Pi, the .pt file stays on your Pi based on whatever path you gave, and (b) if you trained it on your desktop with better GPU VRAM (it is recomemnded since Pi will be slow), then you will have to move all the relevant files from the desktop to the Pi. In this case, I have already provided these files on my forked repo. But if you trained the model, then just run the command below in order to generate .xml and .bin files using your 'strawberry.pt. file. The LanderPi’s yolov11_detect node uses OpenVINO IR format (.xml/.bin) for faster inference on the Raspberry Pi CPU strawberry_pick_ik.launch.py. So, the next following step is to convert The full conversion pipeline is:
strawberry.pt ──► strawberry.xml + strawberry.bin
Step 1: Export .pt to .xml and .bin files
pip install openvino-dev --break-system-packages #install this package if not already
# On the Pi (or your VM)
# Export your .pt to OpenVINO format
from ultralytics import YOLO
yolo export model=/path/to/strawberry.pt format=openvino imgsz=640
This creates a folder strawberry_openvino_model/ and two files within it:
-
strawberry.xml— model architecture -
strawberry.bin— model weights
Step 2: Place the Model Files Within the Docker Container on the Pi
# Copy the .xml and .bin to the models directory
cp path/to/your/strawberry.xml MentorPi:/home/ubuntu/ros2_ws/src/yolov11_detect/models/
cp path/to/your/strawberry.bin MentorPi:/home/ubuntu/ros2_ws/src/yolov11_detect/models/
# Check if both the files have been successfully copied.
cd ros2_ws/src/yolov11_detect/models/
ls
4. Setting Up the ROS2 Package with Strawberry Pick Launch Files
Before getting into this section, let’s understand why does ROS2 need a setup.py entry point? It will be worth understanding why this is necessary considering you plan on doing custom work with LanderPi beyond just this tutorial. In ROS2, when you create a Python-based package, the setup.py file is the bridge between your Python source code (in this case the strawberry .py files) and the ROS2 build system (colcon). Without this entry, ROS2 has no idea your node exists even if the .py file is physically sitting in the right folder. This is why running ros2 launch example strawberry_pick_ik.launch.py without the entry point will throw:
PackageNotFoundError: No package metadata was found for example
Step 1: Add the Entry Point to setup.py
The setup.py file lives at ~/ros2_ws/src/example/setup.py. Open it either using vim or gedit. For me gedit did not work for the first time and threw errors. So, I would recommend using vim.
cd ~/ros2_ws/src/example
vim setup.py
i to enter insert mode, make your edit, then press ESC followed by :wq to save and exit. If you make a mistake, press ESC then :q! to exit without saving. Once inside the setup.py using vim, console_scripts section and add your entry point:
'console_scripts': [
# ... existing entries ...
'strawberry_pick_ik = example.rgbd_function.strawberry_pick_ik:main',
],
console_scripts are all the other LanderPi nodes, including color detection, hand tracking, navigation transport, etc. Do not remove or modify those lines. Just add your new line at the end of the list, making sure the previous line ends with a comma. Step 2: Create __init__.py if It Does Not Exist
This step is important and easy to miss. Check if __init__.py already exists. Just in case for whatever reasons, if __init__.py file does not exist within the /rgbd_function folder, create one using the command below.
touch ~/ros2_ws/src/example/example/rgbd_function/__init__.py
So, why is this step important? When colcon builds the package, Python needs to treat example.rgbd_function as a proper importable package, not just a filesystem folder. The __init__.py file, even though it is completely empty, is the signal Python uses to make this distinction. Without it, when ROS2 tries to resolve your entry point example.rgbd_function.strawberry_pick_ik:main, Python will look for a package called rgbd_function inside example, finding only a folder with no __init__.py, and throw:
ModuleNotFoundError: No module named 'example.rgbd_function'
Step 3: Build the ROS2 Package
Now rebuild the example package so colcon picks up your new entry point and registers it in the install directory. Use the code below.
cd ~/ros2_ws
colcon build --event-handlers console_direct+ --cmake-args -DCMAKE_BUILD_TYPE=Release --symlink-install --packages-select example
Here is what each flag means:
| Flag | Meaning |
|---|---|
colcon build | The ROS2 build command that compiles all packages in the workspace [4] |
--event-handlers console_direct+ | Prints the full build output directly to the terminal in real time — useful for seeing errors immediately instead of buffering them [4] |
--cmake-args -DCMAKE_BUILD_TYPE=Release | Tells CMake to build in Release mode, enabling compiler optimizations for faster and more efficient runtime performance compared to Debug mode [4] |
--symlink-install | Instead of copying files into the install directory, creates symbolic links back to your source files [4]. Any edits to existing .py files like strawberry_pick_ik.py are instantly reflected without rebuilding [6] |
--packages-select example | Builds only the example package instead of rebuilding the entire workspace, much faster during development. |
--symlink-install flag [4] is particularly useful during development. Since it creates symlinks from the install directory directly back to your source files, any changes you make to an existing .py file like strawberry_pick_ik.py [6] are instantly reflected without needing to rebuild. You only need to rebuild when adding new files or new entry points to setup.py. A successful build will show:
Finished <<< example [Xs]
Summary: 1 package finished [Xs]
The stderr warnings about setuptools deprecation are harmless and can be ignored.
Step 4: Verify the Build
After building, confirm your node was installed correctly:
ls ~/ros2_ws/install/example/lib/example/strawberry_pick_ik
And verify the main() function is present in your source file:
grep -n "def main" ~/ros2_ws/src/example/example/rgbd_function/strawberry_pick_ik.py
Both should return valid output. If grep returns nothing, your source file is missing the main() function and the node will fail to launch. Remove the old log files and try to rebuild the package.
Step 5: Open a Fresh Terminal and Launch
Open a fresh terminal inside the Docker container. Like I said before, everything is sourced already. However, you can source again using source ~/.zshrc. IT IS TIME TO LAUNCH AND SEE THE ROBOT IN ACTION!!!!!!!!!
ros2 launch example strawberry_pick_ik.launch.py
5. How the Picking Pipeline Works?
The full autonomous pipeline runs as follows:
| Stage | What Happens | ROS2 Topic / Service |
|---|---|---|
| 🧠 YOLO Detection | YOLOv11 detects ripe and unripe strawberries in every frame | /yolo_node/object_detect |
| 📷 Image Sync | RGB, depth, and camera_info are time-synchronized | /ascamera/camera_publisher/rgb0/image |
| 🎯 PID Tracking | Camera servos adjust to center the ripe berry in frame using PID control | /servo_controller |
| ⏱️ Stability Check | Waits 5 seconds of stable PID tracking before triggering the grab | Internal timer |
| 📐 Depth + IK | Depth ROI averaged → pixel + depth → 3D camera coords → world coords via hand-eye matrix | /kinematics/set_pose_target |
| 🦾 Grab | Arm moves to computed 3D position and gripper closes around the berry | /servo_controller |
| ⬆️ Lift | Arm lifts berry slightly upward before moving to safe retract position | /servo_controller |
| 📦 Place | Arm moves to placement position on the left side and gripper opens to release | /servo_controller |
| 🏠 Return Home | Arm returns to initial scanning position and tracker resets for next berry | /servo_controller |
Key Tuning Parameters
| Parameter | Location | Default | What it controls |
|---|---|---|---|
conf | launch file [5] | 0.45 | YOLO confidence threshold |
position[2] -= X | strawberry_pick_ik.py [6] | 0.02 | How low the arm reaches to grab |
| Stability timer | strawberry_pick_ik.py [6] | 5 sec | Wait time before triggering grab |
| Depth ROI | strawberry_pick_ik.py [6] | 24×24 px | Area used for depth averaging |
| PID threshold | strawberry_pick_ik.py [6] | 0.01 | Sensitivity of camera centering adjustments |
6. The Main Node: strawberry_pick_ik.py
The node uses a custom YoloTracker class that replaces traditional HSV color detection with YOLO bounding box centers strawberry_pick_ik.py. It subscribes to /yolo_node/object_detect and uses PID controllers to center the camera on the best detected ripe strawberry. Within the strawberry_pick_ik.py, look for these lines (below) to understand the code structure.
YoloTracker Class
class YoloTracker:
"""Uses YOLO bounding box center for PID tracking."""
def __init__(self, target_class='ripe'):
self.target_class = target_class # Store the object class to track (default: 'ripe')
self.pid_yaw = pid.PID(20.5, 1.0, 1.2) # PID controller for left/right adjustment
self.pid_pitch = pid.PID(20.5, 1.0, 1.2) # PID controller for up/down adjustment
self.yaw = 500 # Initial horizontal servo/motor position
self.pitch = 150 # Initial vertical servo/motor position
self.center = None # Will hold (x, y) of detected target center
self.detected = False # Tracks whether target is currently visible
def update_detection(self, objects):
best = None # Will hold the highest-confidence matching detection
for obj in objects: # Loop through all detected objects
if obj.class_name == self.target_class: # Only process objects matching target class
cx = (obj.box[0] + obj.box[2]) / 2.0 # Compute bounding box center X
cy = (obj.box[1] + obj.box[3]) / 2.0 # Compute bounding box center Y
if best is None or obj.score > best[3]: # Keep only the highest-confidence detection
best = (cx, cy,
max(abs(obj.box[2]-obj.box[0]),
abs(obj.box[3]-obj.box[1]))/2.0, # Approximate radius of bounding box
obj.score) # Store (cx, cy, radius, confidence)
if best is not None:
self.center = (best[0], best[1]) # Save center coords of best detection
self.detected = True # Mark target as found
else:
self.center = None # Clear center if nothing detected
self.detected = False # Mark target as lost
Stability Check and Depth-to-3D Conversion
# Stability check: both yaw and pitch must differ by less than 3 units from last frame
# Increase threshold (3) if arm grabs too rarely; decrease for stricter lock-on
if abs(self.last_pitch_yaw[0] - p_y[0]) < 3 \
and abs(self.last_pitch_yaw[1] - p_y[1]) < 3:
# Arm must be settled for 5 seconds before grabbing; do not reduce below 3s
# (arm needs time to physically reach and stabilize at tracked position)
if time.time() - self.stamp > 5 and not self.moving:
# 24x24 pixel ROI centered on detection for depth sampling
# Larger ROI = more stable average depth; reduce if targeting small/occluded berries
roi = [
max(0, int(center_y) - 12), min(h, int(center_y) + 12), # Y bounds clamped to frame
max(0, int(center_x) - 12), min(w, int(center_x) + 12), # X bounds clamped to frame
]
roi_distance = depth_image[roi[0]:roi[1], roi[2]:roi[3]] # Crop depth image to ROI
valid_depths = roi_distance[
np.logical_and(roi_distance > 0, roi_distance < 10000)] # Filter out zeros and sensor noise
dist = round(float(np.mean(valid_depths) / 1000.0), 3) # Average depth in meters
dist += 0.015 # Compensate for strawberry physical radius
dist += 0.015 # Compensate for systematic depth sensor error
# Unproject 2D pixel + depth → 3D point in camera frame using intrinsics (fx, fy, cx, cy)
K = depth_camera_info.k
position = depth_pixel_to_camera(
(center_x, center_y), dist, (K[0], K[4], K[2], K[5]))
# Manual offset corrections for RGB-to-depth camera misalignment (meters)
position[0] -= 0.01 # Correct left/right offset between RGB and depth lens
position[1] -= 0.02 # Correct up/down offset between RGB and depth lens
position[2] -= 0.02 # Primary grab depth tuning: negative = closer, positive = further
# Apply hand-eye calibration: camera frame → end-effector frame
pose_end = np.matmul(
self.hand2cam_tf_matrix, # Fixed camera-to-hand transform
common.xyz_euler_to_mat(position, (0, 0, 0))) # 3D point as 4x4 homogeneous matrix
# Apply forward kinematics: end-effector frame → world frame
world_pose = np.matmul(self.endpoint, pose_end)
pose_t, _ = common.mat_to_xyz_euler(world_pose) # Extract (x, y, z) world coordinates
self.moving = True # Lock out new grabs until pick() completes
threading.Thread(target=self.pick, args=(pose_t,)).start() # Run pick sequence in background thread
7. The Launch Node: strawberry_pick_ik.launch.py
The launch file starts all required nodes together in one command: strawberry_pick_ik.launch.py
# YOLO detection node — loads strawberry.xml OpenVINO model
yolov11_node = Node(
package='yolov11_detect',
executable='yolov11_node',
output='screen',
parameters=[
{'classes': ['ripe', 'unripe']},
{'model': 'strawberry', 'conf': conf},
{'start': True},
]
)
# Strawberry IK picking node
strawberry_pick_ik_node = Node(
package='example',
executable='strawberry_pick_ik',
output='screen',
parameters=[{'start': start}],
)
All nodes launched together:
- 📷 Depth camera (RGB + depth + camera_info)
- 🎮 Servo controller manager
- 🦾 Kinematics IK solver
- 🧠 YOLOv11 detection node
- 🍓 Strawberry pick IK node
8. Running the Pipeline
Open a terminal inside the Docker container. Before running any ros2 nodes, ensure running ~/.stop_ros.sh to stop any auto-start services as it can intervene with the other nodes and may delay decision-making. You can also try to source it using source ~/.bashrc. However, I believe, to avoid any issues, make sure you open a new docker terminal so it is sourced automatically and you can run ros2 nodes successfully!
ros2 launch example strawberry_pick_ik.launch.py
# Watch all strawberry-related topics. Run these to ensure your Pi is streaming topics from the sensors and robot telemetry. If you do not see any output, high chances are that either the packages have not been built properly or
ros2 topic list | grep yolo
# Stream detection center coordinates
ros2 topic echo /yolo_node/object_detect
9. Tuning the Arm for Accurate Picking
This section requires patience. Small adjustments will make a big difference! It is going to be a number game and so you will have to find the optimized numbers in order to actuate the arm correctly. I have split the whole section into six parts and changing these will lead to better results. When I first launched the ros2 node, I had to make a lot of adjustments and so be ready to do the same. If the arm grabs above the strawberry, increase the downward offset within strawberry_pick_ik.py. Check below for more detail.
Snippet 1: PID tuning
# Kp=20.5 (responsiveness), Ki=1.0 (drift correction), Kd=1.2 (overshoot damping)
# Increase Kp for faster tracking, decrease to reduce oscillation
self.pid_yaw = pid.PID(20.5, 1.0, 1.2)
self.pid_pitch = pid.PID(20.5, 1.0, 1.2)
self.yaw = 500 # Horizontal servo home position (0–1000)
self.pitch = 150 # Vertical servo home position (100–720)
Snippet 2: Tracking the strawberry and servo limits (Be aware and read the manual for arm servo limits)
# 0.02 = 2% of frame width/height; increase to reduce jitter, decrease for tighter centering
if abs(center_x_norm - 0.5) > 0.02:
...
self.yaw = min(max(self.yaw + self.pid_yaw.output, 0), 1000) # Yaw servo range: 0–1000
if abs(center_y_norm - 0.5) > 0.02:
...
self.pitch = min(max(self.pitch + self.pid_pitch.output, 100), 720) # Pitch servo range: 100–720
Snippet 3: Stability check and grab trigger
# Both axes must settle within 3 units for 2 seconds before grabbing
# Increase threshold (3) if arm grabs too rarely; decrease for stricter lock-on
if abs(self.last_pitch_yaw[0] - p_y[0]) < 3 and abs(self.last_pitch_yaw[1] - p_y[1]) < 3:
if time.time() - self.stamp > 2: # 2s stability window — reduce for faster grabbing
Snippet 4: Depth estimation and distance compensation
roi = [
max(0, int(center_y) - 5), # ROI half-size: increase for smoother depth, decrease for precision
min(h, int(center_y) + 5),
max(0, int(center_x) - 5),
min(w, int(center_x) + 5),
]
dist += 0.015 # Strawberry radius compensation (meters), adjust to actual fruit size
dist += 0.015 # Systematic error compensation, tune based on real grab accuracy
if dist > 0.35: # Max grab distance in meters, increase if arm can reach further
continue
Snippet 5: Camera to world coordinate offset
# Fine-tune these offsets (meters) if the arm consistently misses in one direction
position[0] -= 0.01 # Left/right correction
position[1] -= 0.02 # Up/down correction
position[2] += 0.03 # Forward/backward correction
Snippet 6: Arm speed and picking sequence timing
set_servo_position(self.joints_pub, 1.5, (...)) # 1.5s move duration, reduce for speed, increase for smoothness
time.sleep(2.5) # Wait for arm to reach target should be >= move duration above
set_servo_position(self.joints_pub, 1.0, ((10, 450),)) # Gripper close position: 450 (0=open, 500=fully closed)
time.sleep(1) # Gripper close wait time
position[2] += 0.08 # Lift height after grab (meters), increase to clear obstacles
Common Issues and Fixes
| 🔴 Problem | 🔍 Likely Cause | ✅ Fix |
|---|---|---|
| Arm moves up instead of toward berry | position[2] too small | Increase to -0.05 or more |
| Arm overshoots and misses | position[2] too large | Reduce by 0.01 steps |
| Detection lost during arm motion | Stability timer too short | Increase to 7 or 10 seconds |
| Arm grabs air to the side | position[0] or position[1] wrong | Adjust those offsets |
| No detection at all | Wrong class name | Check target_class='ripe' matches data.yaml |
| Arm grabs unripe berries | Confidence too low | Increase conf to 0.55 or 0.65 |
10. Known Issues and Limitations
These are honest limitations from real-world testing, not necessarily theoretical concerns.
- Hand-eye calibration is manual: The ‘hand2cam_tf_matrix’ within
strawberry_pick_ik.pyis a fixed matrix. For best accuracy, perform a proper calibration using an AprilTag board (comes with the LanderpI packaging) and replace this matrix with your measured transform. - Depth noise at close range: The Aurora 930 can be noisy below ~30cm. One of the things I faced constantly during experimenting is that if I kept the strawberry a little far from the RGBD sensor, the arm struggled to pick it and the gripper kept closing itself without even picking the berry.
- The enlarged 24×24 ROI helps but small or partially white berries are still challenging for the arm to pick.
- Single berry per pick cycle: the current node I have developed tends to pick only one berry at a time. I even kept multiple berries but the arm was confused which one to pick. I believe in the future, I will have to develop a script that saves the position of all the detected berries and then actuate the arm so picking is done on on-by-one basis.
- Lighting sensitivity: YOLO performance drops in low light. I was carrying out the experiment in considerably moderate lightining conditions, but assuming the RGBD sensor may not be very responsive and everything running on the Pi eventually has latency issues once the YOLO makes a decision.
Enjoy Reading This Article?
Here are some more articles you might like to read next: