Real-World Image Segmentation and Mapping with 3 Stereo Cameras in 3D Perception

Objective: We are trying to tune the image segmentation parameters for differentiating dynamic and static objects and addressing the challenges of camera movement during environmental mapping. Specifically, we are facing two issues:

(1) How to differentiate dynamic objects from static ones?
(2) How to effectively map the environment when three Hawk stereo cameras start to move?


Experiment Setup

We followed the tutorial from the following link and launched the 3D perception system by running the provided Docker image.

Initial Setup

We installed three cameras and tested the docker environment with the configuration stereo_camera_configuration, which was updated to ‘front_left_right_configuration’. The system was run using ROS launch, and we observed the environment in RViz under the “Static Voxels” and “People Voxels” categories.


Observed Results

Figure 1 (attached): The panorama view of the conference room captured by my cellphone.

Figure 2 (attached): The initial voxel capture upon starting the ROS launch, showing the expected results for static voxels.

More figures can be found in the following link:

However, these results are only observed in few ideal scenarios in our real-world 3-camera configuration. In most cases, the system produces incorrect masking. For example, when a person enters the scene, static walls are mistakenly marked as red, similar to the person, as seen in Figure 3. This issue occurs because dynamic mapping seems to incorrectly treat parts of the static environment as dynamic objects.

Figure 3 (attached): While dynamic mapping algorithm masks the moving person, parts of the static walls are incorrectly masked as red.

Issue #1: Differentiating Dynamic and Static Objects
How can we improve the system to differentiate dynamic objects (e.g., people) from static objects (e.g., walls) without incorrect masking?


Issue #2: Mapping the Environment with Moving Cameras

We also encountered mapping issues when moving the cameras. We suspect that visual SLAM algorithms may cause errors and drift in visual odometry estimates, leading to incorrect mapping of revisited landmarks.

For instance, when moving the cameras forward and backward like the blue path in Figure 4, the system revisits the floor but incorrectly places it at a higher position, as shown in Figure 5. This drifting could cause static objects to be perceived as dynamic obstacles (marked as red).


Figure 4: Top view of the camera trajectory during the test.

Figure 5 (attached): The revisited floor (marked in red) appears to be at a higher location than the original floor.

Issue #2 Summary:
How can we address the drift and errors in SLAM to ensure accurate mapping of static objects when moving the cameras?

We would appreciate any insights or advice on how to tune the image segmentation parameters or address these issues.


Figure 1: The panorama view of the conference room captured by my cellphone.



Figure 2: The initial voxel capture upon starting the ROS launch, showing the expected results for static voxels.



Figure 3: Dynamic mapping algorithm masks the moving person as red, but parts of the static walls are incorrectly masked.



Figure 4: Top view of the camera trajectory during the test.



Figure 5: The revisited floor (marked in red) appears to be at a higher location than the original floor.

Hi @david.wang.engineer

Thank you for your detailed post, and welcome to the Isaac ROS forum.

I’m looking internally at how to support you better. I will be back with more info soon.

Thank you in advance,
Raffaello

Hi @Raffaello

Thank you for your response.

I saw another web page saying that,

Dynamic reconstruction requires accurate pose estimation. Objects moving slower than the odometry drift can’t be detected as dynamic.

Does the perceptor program publish the drift-related topics? Or how can I balance the tradeoff between the odometry drift and the dynamic movement?

If there is any drift-related topic, I think this might be an approach to solving the above-mentioned mapping issue to achieve my 2nd objective.

Best regards,

David