Isaac ROS Office Hours - Object Detection with YOLOv8

On Feb 21 at 8 a.m. Pacific , I will be hosting office hours covering the Isaac ROS - Object Detection with YOLOv8.

Following this, we’ll have a deeper dive into how to use the Object Detection with YOLOv8.

If you have already questions, please write a comment below! I really appreciate

5e709163134af6b0c9424f711e2cc6b29d23635f

Please join us to learn more about these new features and have your robotics perception questions answered.

Add it to your calendar
YouTube LiveStream

We look forward to seeing you there.

Raffaello

Hello,

I’m currently working with the Isaac ROS wrapper for YOLOv8 object detection and facing challenges in efficiently managing multiple instances of the same class detected in a scene. Given the real-time nature of the application and the need for high performance, I’m looking for advice on best practices or strategies.

Specifically, my questions are:

  1. What are the recommended approaches for tracking multiple objects of the same class across successive frames, considering the dynamics of the scene and potential occlusions?
  2. How can I efficiently store, update, and retrieve information about each detected object, especially when dealing with a large number of instances?
  3. Are there any specific tools or techniques within the Isaac SDK or ROS ecosystem that facilitate this kind of object management?
  4. Can you share insights or examples on integrating these object management strategies with the YOLOv8 detector in an Isaac ROS pipeline?

I appreciate any guidance, examples, or references to documentation that could help address these challenges.

Thank you!

If we also could look into the following topics:

Difference between TensorRT and Triton when using YOLOV8

Also, how can we benefit from, for example INT8 optimization?

Also whenever we “kill” the TensorRT node there is some information displayed about “jobstatitics” or similar that tells us about the inferences time showing Median, 90% and max, could we go over what each one of these paramters show, and how to benchmark the specific yolo models we are running?

Edit:
Here is an example of what I was talking about:

Thanks again

Hi, thanks for your question!

Difference between TensorRT and Triton nodes:

  • The TensorRT node uses TensorRT to optimize models for inference on target hardware. The Triton node uses the Triton Inference Server and supports multiple inference backends like ONNX runtime, Tensorflow, PyTorch and TensorRT.
  • We haven’t measured significant difference in benchmarking results using TensorRT directly versus using Triton with the TensorRT backend.
  • TensorRT may not support inference for some new or custom models, in which case you can use the Triton node.
  • More information here.

For INT8 optimization:
Use trtexec and specify --int8 to convert your model to a TensorRT plan file, example in step 7 here. Pass this output plan file (instead of the onnx model) to the TensorRT node to benefit from int8 optimization.

Regarding inference times in the image you shared - these numbers are generated by TensorRT and not at the Isaac ROS level. TensorRT documentation would be the right place to learn about these particular metrics. As for benchmarking the YOLO models with Isaac ROS for inference, please refer to isaac_ros_benchmark for example scripts.

Hi! Regarding your first question - tracking is available as a mode for the detection model (Track - Ultralytics YOLOv8 Docs).

Could you please explain your use case in more detail?

Thank you for the enlightening workshop. It provided valuable insights that are directly applicable to our project, where we are developing an autonomous boat with a focus on detecting and tracking buoys as one of its key navigational challenges. We are currently exploring implementing the Track functionality from Ultralytics YOLOV8.

Our current setup utilizes a camera for object detection, integrated with LiDAR data to overlay distances onto the camera’s coordinate system. This integration is crucial for accurately determining the distance to detected buoys. To refine our navigation system, we are exploring methodologies for effective buoy tracking. Our approach leverages a GPS system to establish a consistent coordinate framework, allowing us to infer that a buoy detected in the same location as previously observed is, in fact, the same object. This capability is vital for enhancing the boat’s autonomous navigational decisions.

One of the primary challenges we’ve encountered involves the synchronization of LiDAR data with the objects detected by our camera. The inherent latency in processing—where the object detection inference is completed and the bounding box is identified before the corresponding LiDAR data is collected—results in a spatial offset between the detected object and its actual position. This misalignment can critically affect the accuracy of our distance measurements and, by extension, the effectiveness of our navigation system.

To mitigate this issue, we are exploring the implementation of a synchronization mechanism based on the timestamps of image capture and LiDAR data collection. This approach requires precise timing to ensure that the LiDAR data corresponds accurately to the frame in which the object was detected. Implementing such a synchronization likely necessitates the development of a buffering system, which would hold the LiDAR data temporarily until it can be matched with the correct frame of video data.