Autonomous Strawberry Picking Using a ROS2 Mobile Manipulator and YOLOv11-Based Depth-Aware Grasping

🍓 🤖 Background

Small robots (or mobile manipulators) are extremely helpful in settings that are compact or limited with space, such as greenhouses. Often, strawberries grown in greenhouses are harvested manually, and that is exactly where these small robots could make a real difference. Imagine multiple of these working in coordination (like a swarm), moving autonomously through narrow greenhouse rows, identifying only the berries that are ripe, and picking them one by one without disturbing the plant or the fruit around it. No fatigue, no missed berry, no damage!

This work imitated, in a small way, a situation in which ripe strawberries could be detected and picked autonomously and the whole process could be automated. Using YOLOv11 for visual detection (strawberry_pick_ik.launch.py), a depth camera for 3D depth estimation, and inverse kinematics to physically actuate the robotic arm (strawberry_pick_ik.py), the HiWonder LanderPi demonstrated that even a compact, low-cost mobile manipulator running ROS2 Humble inside Docker on a Raspberry Pi 5 can perform a task as nuanced and delicate as selective fruit picking.

The broader implication is not just about strawberries. It is about demonstrating that precision agriculture, the kind that requires selective, gentle, and intelligent interaction with individual plants, no longer has to depend entirely on human hands.

🍓 Overview

This write-up is for those working with a HiWonder LanderPi robot and want to implement autonomous strawberry detection and picking using computer vision and robotic arm control. However, the major focus of this tutorial is targeted towards optimized arm control and an end-effector gripper rather than vision-based detection. If you already know ROS and computer vision, you can clone this repo LanderPi on your RaspberryPi as I have already added all the configuration files along with the trained strawberry model architecture + relevant .py files: strawberry_pick_ik.py and strawberry_pick_ik.launch.py, to jump-start detection and picking.

All you will have to do is to simply move the relevant files into respective docker container folders and perform build in order to add strawberry detection and picking package to the ROS2 setup.py file. This process won’t take more than 15-20 mins provided you know working with Linux-based system. If you want to learn step-by-step process, then start following the steps as mentioned below.

✅PRO TIP: Remember, following the steps below could be painful and extremely slow sometimes / some days. Be consistent as this learning curve could lead to very fruitful results once you see the robot picking strawberries. It happened with me!

The entire pipeline covers:

Training a custom YOLOv11 model on ripe/unripe strawberry classes (Very brief!)
Converting the model to OpenVINO IR format for faster inference on Raspberry Pi (Very brief!)
Deploying it on LanderPi running ROS2 Humble inside Docker (Detailed!)
Using a depth camera + inverse kinematics to physically pick ripe strawberries (Detailed!)

⚠️ Prerequisites ⚠️

• Basic knowledge of Robot Operating System (ROS) and Computer Vision
• Working with Linux-based OS
• Must have followed primary instructions as stated within Robotic Arm Control/LanderPi
• A virtual machine (VM) with Ubuntu 22.04 (Jammy) to visualize simulations using RViz (Optional)
• A working knowledge of training vision models ( YOLO in this case) using necessary hyperparameters
• HiWonder LanderPi with Mecanum chassis [2]
• Aurora 930 depth camera [2]
• ROS2 Humble running inside Docker [1]
• The workspace lives at /home/ubuntu/ros2_ws within the container [1]
• All commands must run inside the Docker container unless stated otherwise

📋 Table of Contents

Hardware and Software Requirements
Training the Strawberry Detection Model
Converting the Model for Deployment
Setting Up the ROS2 Package with Strawberry Pick Launch Files
How the Picking Pipeline Works?
The Main Node
The Launch File
Running the Pipeline
Tuning the Arm for Accurate Picking
Known Issues and Limitations

1. Hardware and Software Requirements

Component	Details
🤖 Robot	HiWonder LanderPi with Mecanum chassis
📷 Camera	Aurora 930 depth camera (RGB + depth)
💻 Compute	Raspberry Pi 5 inside Docker container
🐧 OS	Ubuntu 22.04, ROS2 Humble
🧠 ML Framework	Ultralytics YOLOv11 + OpenVINO Runtime
🐍 Language	Python 3.10

2. Training the Strawberry Detection Model

ℹ️Note: This is the most important and time-consuming part. The quality of your dataset directly determines how well the robot detects strawberries. However, to save time, I have provided `strawberry.pt` file which you'll have to move to a particular destination in order to invoke detection which will then trigger picking nodes of the robotic arm. I have discussed this in [5].

I trained a custom YOLOv11n (nano for Raspberry Pi) model with two classes: ripe and unripe uisng the dataset from Roboflow. However, within the repo that you’ll clone, I have already provided strawberry.pt file to get started. I trained my own model with custom hyperparameters. With the interest of space within this Blog, I am not going explain in detail on how to train a YOLO-based vision model for detection purpose. If you encounter any issues, please reach out to me or pull an issue.

Your strawberry_data.yaml should look like this (below). Make sure when you are training your own YOLOv11n model, this *.yaml file lives inside the main dataset folder where test, train, and valid folders are present.

train: ../train/images
val: ../valid/images
test: ../test/images

nc: 2
names: ['ripe', 'unripe']

roboflow:
  workspace: eric-z0ptd
  project: strawberry_picking_2
  version: 1
  license: CC BY 4.0
  url: https://universe.roboflow.com/eric-z0ptd/strawberry_picking_2/dataset/1

from ultralytics import YOLO
model = YOLO('yolo11n.pt')  # Start from nano pretrained weights
model.train(
    data='strawberry_data.yaml',
    epochs=100,
    imgsz=640,
    batch=16,
    name='strawberry_yolo11'
)

3. Converting the Model for Deployment on LanderPi

Once your model is trained, you will see strawberry.pt (or whatever name you trained your model with) is saved inside the custom path. Now this step completely depends on where you trained your model. There are two options: (a) if you trained the model on the Pi, the .pt file stays on your Pi based on whatever path you gave, and (b) if you trained it on your desktop with better GPU VRAM (it is recomemnded since Pi will be slow), then you will have to move all the relevant files from the desktop to the Pi. In this case, I have already provided these files on my forked repo. But if you trained the model, then just run the command below in order to generate .xml and .bin files using your 'strawberry.pt. file. The LanderPi’s yolov11_detect node uses OpenVINO IR format (.xml/.bin) for faster inference on the Raspberry Pi CPU strawberry_pick_ik.launch.py. So, the next following step is to convert The full conversion pipeline is:

strawberry.pt  ──►  strawberry.xml + strawberry.bin

Step 1: Export `.pt` to `.xml` and `.bin` files

pip install openvino-dev --break-system-packages #install this package if not already 

# On the Pi (or your VM) 
# Export your .pt to OpenVINO format
from ultralytics import YOLO
yolo export model=/path/to/strawberry.pt format=openvino imgsz=640

This creates a folder strawberry_openvino_model/ and two files within it:

strawberry.xml — model architecture
strawberry.bin — model weights

Step 2: Place the Model Files Within the Docker Container on the Pi

# Copy the .xml and .bin to the models directory

cp path/to/your/strawberry.xml MentorPi:/home/ubuntu/ros2_ws/src/yolov11_detect/models/
cp path/to/your/strawberry.bin MentorPi:/home/ubuntu/ros2_ws/src/yolov11_detect/models/

# Check if both the files have been successfully copied.

cd ros2_ws/src/yolov11_detect/models/
ls

4. Setting Up the ROS2 Package with Strawberry Pick Launch Files

Before getting into this section, let’s understand why does ROS2 need a setup.py entry point? It will be worth understanding why this is necessary considering you plan on doing custom work with LanderPi beyond just this tutorial. In ROS2, when you create a Python-based package, the setup.py file is the bridge between your Python source code (in this case the strawberry .py files) and the ROS2 build system (colcon). Without this entry, ROS2 has no idea your node exists even if the .py file is physically sitting in the right folder. This is why running ros2 launch example strawberry_pick_ik.launch.py without the entry point will throw:

PackageNotFoundError: No package metadata was found for example

Step 1: Add the Entry Point to `setup.py`

The setup.py file lives at ~/ros2_ws/src/example/setup.py. Open it either using vim or gedit. For me gedit did not work for the first time and threw errors. So, I would recommend using vim.

cd ~/ros2_ws/src/example
vim setup.py

⚠️vim quick reference: Press i to enter insert mode, make your edit, then press ESC followed by :wq to save and exit. If you make a mistake, press ESC then :q! to exit without saving.

Once inside the setup.py using vim, console_scripts section and add your entry point:

'console_scripts': [
    # ... existing entries ...
    'strawberry_pick_ik = example.rgbd_function.strawberry_pick_ik:main',
],

ℹ️Note: The existing entries in console_scripts are all the other LanderPi nodes, including color detection, hand tracking, navigation transport, etc. Do not remove or modify those lines. Just add your new line at the end of the list, making sure the previous line ends with a comma.

Step 2: Create `init.py` if It Does Not Exist

This step is important and easy to miss. Check if __init__.py already exists. Just in case for whatever reasons, if __init__.py file does not exist within the /rgbd_function folder, create one using the command below.

touch ~/ros2_ws/src/example/example/rgbd_function/__init__.py

So, why is this step important? When colcon builds the package, Python needs to treat example.rgbd_function as a proper importable package, not just a filesystem folder. The __init__.py file, even though it is completely empty, is the signal Python uses to make this distinction. Without it, when ROS2 tries to resolve your entry point example.rgbd_function.strawberry_pick_ik:main, Python will look for a package called rgbd_function inside example, finding only a folder with no __init__.py, and throw:

ModuleNotFoundError: No module named 'example.rgbd_function'

Step 3: Build the ROS2 Package

Now rebuild the example package so colcon picks up your new entry point and registers it in the install directory. Use the code below.

cd ~/ros2_ws
colcon build --event-handlers console_direct+ --cmake-args -DCMAKE_BUILD_TYPE=Release --symlink-install --packages-select example

Here is what each flag means:

Flag	Meaning
`colcon build`	The ROS2 build command that compiles all packages in the workspace [4]
`--event-handlers console_direct+`	Prints the full build output directly to the terminal in real time — useful for seeing errors immediately instead of buffering them [4]
`--cmake-args -DCMAKE_BUILD_TYPE=Release`	Tells CMake to build in Release mode, enabling compiler optimizations for faster and more efficient runtime performance compared to Debug mode [4]
`--symlink-install`	Instead of copying files into the install directory, creates symbolic links back to your source files [4]. Any edits to existing `.py` files like `strawberry_pick_ik.py` are instantly reflected without rebuilding [6]
`--packages-select example`	Builds only the `example` package instead of rebuilding the entire workspace, much faster during development.

✅ Pro tip — The symlink advantage: The --symlink-install flag [4] is particularly useful during development. Since it creates symlinks from the install directory directly back to your source files, any changes you make to an existing .py file like strawberry_pick_ik.py [6] are instantly reflected without needing to rebuild. You only need to rebuild when adding new files or new entry points to setup.py.

A successful build will show:

Finished <<< example [Xs]
Summary: 1 package finished [Xs]

The stderr warnings about setuptools deprecation are harmless and can be ignored.

Step 4: Verify the Build

After building, confirm your node was installed correctly:

ls ~/ros2_ws/install/example/lib/example/strawberry_pick_ik

And verify the main() function is present in your source file:

grep -n "def main" ~/ros2_ws/src/example/example/rgbd_function/strawberry_pick_ik.py

Both should return valid output. If grep returns nothing, your source file is missing the main() function and the node will fail to launch. Remove the old log files and try to rebuild the package.

Step 5: Open a Fresh Terminal and Launch

Open a fresh terminal inside the Docker container. Like I said before, everything is sourced already. However, you can source again using source ~/.zshrc. IT IS TIME TO LAUNCH AND SEE THE ROBOT IN ACTION!!!!!!!!!

ros2 launch example strawberry_pick_ik.launch.py

⛔Important: Always open a fresh terminal after building. Reusing the same terminal keeps stale environment variables cached from before the build, which can cause sourcing errors even if the build succeeded. I faced this issue and it took me a while to understand.

5. How the Picking Pipeline Works?

The full autonomous pipeline runs as follows:

Stage	What Happens	ROS2 Topic / Service
🧠 YOLO Detection	YOLOv11 detects ripe and unripe strawberries in every frame	`/yolo_node/object_detect`
📷 Image Sync	RGB, depth, and camera_info are time-synchronized	`/ascamera/camera_publisher/rgb0/image`
🎯 PID Tracking	Camera servos adjust to center the ripe berry in frame using PID control	`/servo_controller`
⏱️ Stability Check	Waits 5 seconds of stable PID tracking before triggering the grab	Internal timer
📐 Depth + IK	Depth ROI averaged → pixel + depth → 3D camera coords → world coords via hand-eye matrix	`/kinematics/set_pose_target`
🦾 Grab	Arm moves to computed 3D position and gripper closes around the berry	`/servo_controller`
⬆️ Lift	Arm lifts berry slightly upward before moving to safe retract position	`/servo_controller`
📦 Place	Arm moves to placement position on the left side and gripper opens to release	`/servo_controller`
🏠 Return Home	Arm returns to initial scanning position and tracker resets for next berry	`/servo_controller`

Key Tuning Parameters

Parameter	Location	Default	What it controls
`conf`	launch file [5]	`0.45`	YOLO confidence threshold
`position[2] -= X`	strawberry_pick_ik.py [6]	`0.02`	How low the arm reaches to grab
Stability timer	strawberry_pick_ik.py [6]	`5 sec`	Wait time before triggering grab
Depth ROI	strawberry_pick_ik.py [6]	`24×24 px`	Area used for depth averaging
PID threshold	strawberry_pick_ik.py [6]	`0.01`	Sensitivity of camera centering adjustments

6. The Main Node: `strawberry_pick_ik.py`

The node uses a custom YoloTracker class that replaces traditional HSV color detection with YOLO bounding box centers strawberry_pick_ik.py. It subscribes to /yolo_node/object_detect and uses PID controllers to center the camera on the best detected ripe strawberry. Within the strawberry_pick_ik.py, look for these lines (below) to understand the code structure.

YoloTracker Class

class YoloTracker:
    """Uses YOLO bounding box center for PID tracking."""
    
    def __init__(self, target_class='ripe'):
        self.target_class = target_class # Store the object class to track (default: 'ripe')
        self.pid_yaw = pid.PID(20.5, 1.0, 1.2) # PID controller for left/right adjustment
        self.pid_pitch = pid.PID(20.5, 1.0, 1.2) # PID controller for up/down adjustment
        self.yaw = 500 # Initial horizontal servo/motor position
        self.pitch = 150 # Initial vertical servo/motor position
        self.center = None # Will hold (x, y) of detected target center
        self.detected = False  # Tracks whether target is currently visible

    def update_detection(self, objects):
        best = None  # Will hold the highest-confidence matching detection
        
        for obj in objects:  # Loop through all detected objects
            if obj.class_name == self.target_class:  # Only process objects matching target class
                cx = (obj.box[0] + obj.box[2]) / 2.0  # Compute bounding box center X
                cy = (obj.box[1] + obj.box[3]) / 2.0  # Compute bounding box center Y
                
                if best is None or obj.score > best[3]:  # Keep only the highest-confidence detection
                    best = (cx, cy,
                            max(abs(obj.box[2]-obj.box[0]),
                                abs(obj.box[3]-obj.box[1]))/2.0, # Approximate radius of bounding box
                            obj.score)  # Store (cx, cy, radius, confidence)

        if best is not None:
            self.center = (best[0], best[1]) # Save center coords of best detection
            self.detected = True # Mark target as found
        else:
            self.center = None  # Clear center if nothing detected
            self.detected = False # Mark target as lost

Stability Check and Depth-to-3D Conversion

# Stability check: both yaw and pitch must differ by less than 3 units from last frame
# Increase threshold (3) if arm grabs too rarely; decrease for stricter lock-on

if abs(self.last_pitch_yaw[0] - p_y[0]) < 3 \
        and abs(self.last_pitch_yaw[1] - p_y[1]) < 3:
    
    # Arm must be settled for 5 seconds before grabbing; do not reduce below 3s
    # (arm needs time to physically reach and stabilize at tracked position)
    if time.time() - self.stamp > 5 and not self.moving:
        
        # 24x24 pixel ROI centered on detection for depth sampling
        # Larger ROI = more stable average depth; reduce if targeting small/occluded berries
        roi = [
            max(0, int(center_y) - 12), min(h, int(center_y) + 12), # Y bounds clamped to frame
            max(0, int(center_x) - 12), min(w, int(center_x) + 12), # X bounds clamped to frame
        ]
        roi_distance = depth_image[roi[0]:roi[1], roi[2]:roi[3]] # Crop depth image to ROI
        
        valid_depths = roi_distance[
            np.logical_and(roi_distance > 0, roi_distance < 10000)] # Filter out zeros and sensor noise
        dist = round(float(np.mean(valid_depths) / 1000.0), 3) # Average depth in meters

        dist += 0.015  # Compensate for strawberry physical radius
        dist += 0.015  # Compensate for systematic depth sensor error

        # Unproject 2D pixel + depth → 3D point in camera frame using intrinsics (fx, fy, cx, cy)
        K = depth_camera_info.k
        position = depth_pixel_to_camera(
            (center_x, center_y), dist, (K[0], K[4], K[2], K[5]))

        # Manual offset corrections for RGB-to-depth camera misalignment (meters)
        position[0] -= 0.01  # Correct left/right offset between RGB and depth lens
        position[1] -= 0.02  # Correct up/down offset between RGB and depth lens
        position[2] -= 0.02  # Primary grab depth tuning: negative = closer, positive = further

        # Apply hand-eye calibration: camera frame → end-effector frame
        pose_end = np.matmul(
            self.hand2cam_tf_matrix, # Fixed camera-to-hand transform
            common.xyz_euler_to_mat(position, (0, 0, 0))) # 3D point as 4x4 homogeneous matrix

        # Apply forward kinematics: end-effector frame → world frame
        world_pose = np.matmul(self.endpoint, pose_end)
        pose_t, _ = common.mat_to_xyz_euler(world_pose) # Extract (x, y, z) world coordinates

        self.moving = True # Lock out new grabs until pick() completes
        threading.Thread(target=self.pick, args=(pose_t,)).start() # Run pick sequence in background thread

7. The Launch Node: `strawberry_pick_ik.launch.py`

The launch file starts all required nodes together in one command: strawberry_pick_ik.launch.py

# YOLO detection node — loads strawberry.xml OpenVINO model
yolov11_node = Node(
    package='yolov11_detect',
    executable='yolov11_node',
    output='screen',
    parameters=[
        {'classes': ['ripe', 'unripe']},
        {'model': 'strawberry', 'conf': conf},
        {'start': True},
    ]
)

# Strawberry IK picking node
strawberry_pick_ik_node = Node(
    package='example',
    executable='strawberry_pick_ik',
    output='screen',
    parameters=[{'start': start}],
)

All nodes launched together:

📷 Depth camera (RGB + depth + camera_info)
🎮 Servo controller manager
🦾 Kinematics IK solver
🧠 YOLOv11 detection node
🍓 Strawberry pick IK node

8. Running the Pipeline

Open a terminal inside the Docker container. Before running any ros2 nodes, ensure running ~/.stop_ros.sh to stop any auto-start services as it can intervene with the other nodes and may delay decision-making. You can also try to source it using source ~/.bashrc. However, I believe, to avoid any issues, make sure you open a new docker terminal so it is sourced automatically and you can run ros2 nodes successfully!

ros2 launch example strawberry_pick_ik.launch.py

# Watch all strawberry-related topics. Run these to ensure your Pi is streaming topics from the sensors and robot telemetry. If you do not see any output, high chances are that either the packages have not been built properly or 
ros2 topic list | grep yolo

# Stream detection center coordinates
ros2 topic echo /yolo_node/object_detect

9. Tuning the Arm for Accurate Picking

This section requires patience. Small adjustments will make a big difference! It is going to be a number game and so you will have to find the optimized numbers in order to actuate the arm correctly. I have split the whole section into six parts and changing these will lead to better results. When I first launched the ros2 node, I had to make a lot of adjustments and so be ready to do the same. If the arm grabs above the strawberry, increase the downward offset within strawberry_pick_ik.py. Check below for more detail.

Snippet 1: PID tuning

# Kp=20.5 (responsiveness), Ki=1.0 (drift correction), Kd=1.2 (overshoot damping)
# Increase Kp for faster tracking, decrease to reduce oscillation
self.pid_yaw   = pid.PID(20.5, 1.0, 1.2)
self.pid_pitch = pid.PID(20.5, 1.0, 1.2)
self.yaw   = 500  # Horizontal servo home position (0–1000)
self.pitch = 150  # Vertical servo home position (100–720)

Snippet 2: Tracking the strawberry and servo limits (Be aware and read the manual for arm servo limits)

# 0.02 = 2% of frame width/height; increase to reduce jitter, decrease for tighter centering
if abs(center_x_norm - 0.5) > 0.02:
    ...
    self.yaw = min(max(self.yaw + self.pid_yaw.output, 0), 1000) # Yaw servo range: 0–1000
    
if abs(center_y_norm - 0.5) > 0.02:
    ...
    self.pitch = min(max(self.pitch + self.pid_pitch.output, 100), 720)  # Pitch servo range: 100–720

Snippet 3: Stability check and grab trigger

# Both axes must settle within 3 units for 2 seconds before grabbing
# Increase threshold (3) if arm grabs too rarely; decrease for stricter lock-on
if abs(self.last_pitch_yaw[0] - p_y[0]) < 3 and abs(self.last_pitch_yaw[1] - p_y[1]) < 3:
    if time.time() - self.stamp > 2: # 2s stability window — reduce for faster grabbing

Snippet 4: Depth estimation and distance compensation

roi = [
    max(0, int(center_y) - 5),   # ROI half-size: increase for smoother depth, decrease for precision
    min(h, int(center_y) + 5),
    max(0, int(center_x) - 5),
    min(w, int(center_x) + 5),
]
dist += 0.015   # Strawberry radius compensation (meters), adjust to actual fruit size
dist += 0.015   # Systematic error compensation, tune based on real grab accuracy

if dist > 0.35:   # Max grab distance in meters, increase if arm can reach further
    continue

Snippet 5: Camera to world coordinate offset

# Fine-tune these offsets (meters) if the arm consistently misses in one direction
position[0] -= 0.01   # Left/right correction
position[1] -= 0.02   # Up/down correction
position[2] += 0.03   # Forward/backward correction

Snippet 6: Arm speed and picking sequence timing

set_servo_position(self.joints_pub, 1.5, (...))  # 1.5s move duration, reduce for speed, increase for smoothness
time.sleep(2.5)   # Wait for arm to reach target should be >= move duration above

set_servo_position(self.joints_pub, 1.0, ((10, 450),))  # Gripper close position: 450 (0=open, 500=fully closed)
time.sleep(1)     # Gripper close wait time

position[2] += 0.08   # Lift height after grab (meters), increase to clear obstacles

Common Issues and Fixes

🔴 Problem	🔍 Likely Cause	✅ Fix
Arm moves up instead of toward berry	`position[2]` too small	Increase to `-0.05` or more
Arm overshoots and misses	`position[2]` too large	Reduce by `0.01` steps
Detection lost during arm motion	Stability timer too short	Increase to `7` or `10` seconds
Arm grabs air to the side	`position[0]` or `position[1]` wrong	Adjust those offsets
No detection at all	Wrong class name	Check `target_class='ripe'` matches `data.yaml`
Arm grabs unripe berries	Confidence too low	Increase `conf` to `0.55` or `0.65`

10. Known Issues and Limitations

These are honest limitations from real-world testing, not necessarily theoretical concerns.

Hand-eye calibration is manual: The ‘hand2cam_tf_matrix’ within strawberry_pick_ik.py is a fixed matrix. For best accuracy, perform a proper calibration using an AprilTag board (comes with the LanderpI packaging) and replace this matrix with your measured transform.
Depth noise at close range: The Aurora 930 can be noisy below ~30cm. One of the things I faced constantly during experimenting is that if I kept the strawberry a little far from the RGBD sensor, the arm struggled to pick it and the gripper kept closing itself without even picking the berry.
The enlarged 24×24 ROI helps but small or partially white berries are still challenging for the arm to pick.
Single berry per pick cycle: the current node I have developed tends to pick only one berry at a time. I even kept multiple berries but the arm was confused which one to pick. I believe in the future, I will have to develop a script that saves the position of all the detected berries and then actuate the arm so picking is done on on-by-one basis.
Lighting sensitivity: YOLO performance drops in low light. I was carrying out the experiment in considerably moderate lightining conditions, but assuming the RGBD sensor may not be very responsive and everything running on the Pi eventually has latency issues once the YOLO makes a decision.