Managed Nitros PubSub

Hello,

I am using ros2 humble, and am trying to inject a stream of sensor_msgs/Image into a Nitros enabled node such as isaac_ros::image_proc::RectifyNode. In order to mitigate the bottleneck of the CPU/GPU copy and to feed in a steady stream of images I would like to queue the NitrosImage msgs in an intermediate node before passing it to RectifyNode.

It seems like the Managed Nitros Publisher would do the job here but it is currently only limited to isaac_ros_nitros_tensor_list_type. Are there plans to extend this to the rest of the nitros types such as NitrosImage, so that I can write a Node that can queue NitrosImage msgs and then publish with a Nitros Publisher, which can then be ingested by an existing Nitros enabled node?

If so, is there a timeline for this feature addition?

Thanks,

1 Like

CUDA with NITROS is actually enabled for both TensorLists AND Image already. Please see here on how to use it with image processing examples including the YOLOv8 decoder itself. We’re working on other types based on CUDA buffers but these two were the most important for image processing and DNN inference pipelines.

There is a note in the docs which claims CUDA with NITROS only works with TensorLists but that is incorrect and we’ll fix it shortly.

Hi Hemal,

Thanks for clarifying, I was able to use and modify the GpuImageBuilderNode. I implemented a very basic queue in the GpuImageBuilderNode that would first queue up 300 frames, and then start publishing at a rate of 33 ms. I did this so that I would have a steady rate of publishing NitrosImages into the RectifyNode then eventually into the ESS node.

We had found that the ros benchmarking examples contained queueing of the NitrosImage types before publishing them into the Node under test (ESS in my case), so I thought doing the same would help resolve the bottleneck issues, but the ESS node still seems to choke after some time.

Here is my diff on the GpuImageBuilderNode. Can you share any insight on what could be causing a throughput bottleneck still, even though the NitrosImages are queued and fed at a steady rate?

diff --git a/isaac_ros_managed_nitros_examples/custom_nitros_image/include/custom_nitros_image/gpu_image_builder_node.hpp b/isaac_ros_managed_nitros_examples/custom_nitros_image/include/custom_nitros_image/gpu_image_builder_node.hpp
index 6a93138..eb06568 100644
--- a/isaac_ros_managed_nitros_examples/custom_nitros_image/include/custom_nitros_image/gpu_image_builder_node.hpp
+++ b/isaac_ros_managed_nitros_examples/custom_nitros_image/include/custom_nitros_image/gpu_image_builder_node.hpp
@@ -27,6 +27,8 @@
 #include "sensor_msgs/msg/image.hpp"
 #include "isaac_ros_nitros_image_type/nitros_image.hpp"
 
+#include <queue>
+
 namespace custom_nitros_image
 {
 
@@ -46,6 +48,12 @@ private:
   // Publisher for output NitrosImage messages
   std::shared_ptr<nvidia::isaac_ros::nitros::ManagedNitrosPublisher<
       nvidia::isaac_ros::nitros::NitrosImage>> nitros_pub_;
+
+  // Queue for Nitros Images
+  std::queue<nvidia::isaac_ros::nitros::NitrosImage> myQueue;
+  void publishImage();
+  rclcpp::TimerBase::SharedPtr timer;
+  bool blockPublishing;
 };
 
 }  // namespace custom_nitros_image
diff --git a/isaac_ros_managed_nitros_examples/custom_nitros_image/src/gpu_image_builder_node.cpp b/isaac_ros_managed_nitros_examples/custom_nitros_image/src/gpu_image_builder_node.cpp
index 16a6106..05630c9 100644
--- a/isaac_ros_managed_nitros_examples/custom_nitros_image/src/gpu_image_builder_node.cpp
+++ b/isaac_ros_managed_nitros_examples/custom_nitros_image/src/gpu_image_builder_node.cpp
@@ -22,6 +22,7 @@
 #include "isaac_ros_nitros_image_type/nitros_image_builder.hpp"
 #include "sensor_msgs/image_encodings.hpp"
 
+
 namespace custom_nitros_image
 {
 
@@ -33,10 +34,30 @@ GpuImageBuilderNode::GpuImageBuilderNode(const rclcpp::NodeOptions options)
   nitros_pub_{std::make_shared<nvidia::isaac_ros::nitros::ManagedNitrosPublisher<
         nvidia::isaac_ros::nitros::NitrosImage>>(
       this, "gpu_image",
-      nvidia::isaac_ros::nitros::nitros_image_rgb8_t::supported_type_name)} {}
+      nvidia::isaac_ros::nitros::nitros_image_rgb8_t::supported_type_name)},
+      blockPublishing(true) {
+
+        // Create a timer to publish the Image message every 33 milliseconds
+        timer = this->create_wall_timer(std::chrono::milliseconds(33),
+                                         std::bind(&GpuImageBuilderNode::publishImage, this));
+
+      }
 
 GpuImageBuilderNode::~GpuImageBuilderNode() = default;
 
+void GpuImageBuilderNode::publishImage(){
+  if(blockPublishing)
+    return;
+
+  if(myQueue.size() == 0){
+    RCLCPP_INFO(this->get_logger(), "Queue is empty, cannot publish!");
+  }else{
+    RCLCPP_INFO(this->get_logger(), "Publishing image.. Current queue size: %i", myQueue.size());
+    nitros_pub_->publish(myQueue.front());
+    myQueue.pop();
+  }
+}
+
 void GpuImageBuilderNode::InputCallback(const sensor_msgs::msg::Image::SharedPtr msg)
 {
   // Get size of image
@@ -64,8 +85,16 @@ void GpuImageBuilderNode::InputCallback(const sensor_msgs::msg::Image::SharedPtr
     .WithGpuData(buffer)
     .Build();
 
-  nitros_pub_->publish(nitros_image);
-  RCLCPP_INFO(this->get_logger(), "Sent CUDA buffer with memory at: %p", buffer);
+  myQueue.push(nitros_image);
+  RCLCPP_INFO(this->get_logger(), "Queued NitrosImage with CUDA buffer memory at: %p", buffer);
+
+  if(myQueue.size() == 300){
+    RCLCPP_INFO(this->get_logger(), "Queue is now full! Unblock publishing!");
+    blockPublishing = false;
+  }
+
+  // nitros_pub_->publish(nitros_image);
+  // RCLCPP_INFO(this->get_logger(), "Sent CUDA buffer with memory at: %p", buffer);
 }
 
 }  // namespace custom_nitros_image

Your changes to GpuImageBuilderNode seem fine. Could you provide some logs on “ESS node still seems to choke after some time”?

What is your normal image source? If it is running at 30hz and your system is able to handle that flow, there shouldn’t be a need to “buffer and blast” which would increase latency significantly. We do this in ros2_benchmark to publish at different rates to proble the maximum sustained frequency the graph can reasonable handle.