Isaac_ros_yolov8 slow nms

Hi,

I am using the isaac_ros_yolov8 package for real-time inference on a video stream using YOLOv8. I have noticed a significant drop in the frame rate of ROS2 images (22 FPS) when using this node, although the trtexec tool shows a throughput of 45 FPS on the Jetson Orin NX.

Upon further investigation, I found that the problem is caused by the isaac_ros_yolov8 decoder node using the cv::dnn::NMSBoxes function, which is extremely slow (perhaps because OpenCV is not built with CUDA support in the Isaac ROS Humble Docker image).

https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_object_detection/blob/main/isaac_ros_yolov8/src/yolov8_decoder_node.cpp#L108

I also discovered that Ultralytics can export the non-maximum suppression (NMS) operation inside the ONNX model, and TensorRT has a plugin support for NMS.

I have two questions:

  1. Is it possible to export the YOLOv8 model with NMS and use it with the isaac_ros_yolov8 package? If yes, what would the decoder implementation look like?

  2. Is it possible to use a GPU-accelerated NMS implementation instead of the OpenCV NMSBoxes method?

Thank you.

Hi @hshh

Thank you for your post. We are currently investigating your issues and forwarding this topic to the engineering team. We will provide more details soon.

Raffaello