I have a question on ISSAC ROS from one of my customer.
As ISSAC and ROS add layers to run inference, they cause latency. The ideal approach is to take the above DNN for instance, compile it into a TensorRT model and then use a pure C++/CUDA framework to directly invoke it. We will get much better performance than going through ROS nodes or Isaac which is why we do not use either.
The direct messaging from Isaac to ROS 2 is good but ideally we really shouldn’t be running ROS at all so to remove that entire layer for latency reasons.
NVidia’s Bi3D DNN for example could be a better solution which can be found here:
The isaac_ros_tensor_rt package is itself a pure C++ wrapper for working with TensorRT through a ROS interface. ROS provides a component architecture within a message-passing distributed system framework that has strong advantages for reusability and debugging. One can always invoke TensorRT inline in your own code in one monolithic architecture but you would lose modularity for trivial latency improvement where the model inference itself will dwarf any layer overhead.
With ROS2 Humble intraprocess communication, we’ve seen negligible overhead from message transport which is essentially moving a pointer into a vector and back out again. Do you have measurements of the latency you’re seeing ROS2 adding? A good test application can be found here
The isaac_ros_bi3d package is a Bi3D inference implementation that leverages DLAs and GPUs efficiently on Jetson using TensorRT for high throughput/low latency. It was developed as an optimized version of the NVlabs/Bi3D Python code that you mentioned.
If they have developed their own stack and prefer to use that instead of ROS, that’s great. They can use Isaac ROS packages as a reference example to integrate NVIDIA APIs into their own framework.