Performance Issue of ESS Disparity Graph on RTX4090

Hi there,

I’m trying to take a side-by-side 30fps 720p stereo video feed, run through ESS Disparity, and then feed the output into NVblox to get a 3D reconstruction in realtime.

I’ve written a Python node using OpenCV that splits the side by side video into left/right images for input to ESS.

I’ve created a ROS2 launch file to launch the following disparity graph / pipeline:

gscam2 -> stereo_splitter -> Isaac ROS Rectification -> Isaac ROS Resize -> Isaac ROS ESS StereoDNN Disparity

We’ve ran this pipeline on a PC with an RTX4090, and are noticing intermittent throttling. Sometimes it works for a few minutes, and then permanently throttles. Other times, it temporarily throttles and then recovers.

I collected data using ros2 topic hz -w 30 /<topic_name> | grep average for each of the relevant topics, and graphed them below. Notice at ~50second mark, the FPS performance of all NITROS nodes dips, and eventually recovers.

Interestingly, the exact same launch file, settings, and input works perfectly on our laptops with an A2000 GPU. Notice in the graph below, there is no drop in FPS performance in any nodes, and all nodes are consistently > 20fps.

We’ve found that whether the left/right images are fed into Rectification, Resize, or Disparity node, directly, it exhibits throttling on the RTX 4090. For example, running only rectification nodes in Nitros as shown, also experiences an FPS drop:

gscam2 -> stereo_splitter -> Isaac ROS Rectification

We suspect that the data flow between the non-NITROS nodes (stereo_splitter) and the first NITROS-enabled node which accepts the left/right images is a performance bottleneck on the RTX4090.

We are using GPU driver 525.85 on the RTX4090 PC, manually installed through the .run file (as per the Isaac sim Recommendation), and believe the rest of the system is configured correctly on Ubuntu 22.04:

  • The PowerMizer settings in “Nvidia settings” are set to “Prefer Maximum Performance”.
  • We’ve also tried maxing the CPU clock frequency to performance mode with cpupower-gui
  • We’ve confirmed it’s not network latency issues.

We initially tried GPU driver version 535.129, but found the throttling performance was worse / started throttling earlier.

When we notice throttling, we also notice the GPU usage drops and CPU usage increases. When the FPS issue eventually recovers, the CPU/GPU usage goes back to normal.

We are confused why this works on our laptop, but not an RTX4090 PC. Could you shed some light on this issue? Do you see any issues with the way existing nodes are interacting with each other? Are there some things you can suggest to verify the RTX4090 is setup correctly, or some tools to help debug and solve this?

PC Specs:

  • RAM: 64GB
  • Processor: i9-13900KF
  • GPU: Nvidia RTX 4090
  • OS: Ubuntu 22.04.3
  • Isaac ROS: 2.0
  • ESS model: 3.0
 >> nvidia-smi
Mon Nov 13 17:40:23 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.05    Driver Version: 525.85.05    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  Off |
| 31%   31C    P8    25W / 450W |    204MiB / 24564MiB |      2%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2048      G   /usr/lib/xorg/Xorg                 76MiB |
|    0   N/A  N/A      2182      G   /usr/bin/gnome-shell              126MiB |
+-----------------------------------------------------------------------------+

Laptop specs:

  • Lenovo Thinkpad P17 Gen2i
  • RAM: 32GB
  • Processor: i7-11800H
  • GPU: Nvidia RTX A2000 Mobile
  • OS: Ubuntu 22.04.3
  • Isaac ROS: 2.0
  • ESS model: 3.0
 >> nvidia-smi
Mon Nov 13 17:43:22 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA RTX A2000 Laptop GPU    Off | 00000000:01:00.0  On |                  N/A |
| N/A   51C    P0              14W /  60W |    557MiB /  4096MiB |      2%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      3268      G   /usr/lib/xorg/Xorg                          185MiB |
|    0   N/A  N/A      3674      G   /usr/bin/gnome-shell                         75MiB |
|    0   N/A  N/A      4335      G   ...,WinRetrieveSuggestionsOnlyOnDemand       26MiB |
|    0   N/A  N/A      7422      G   .../jvm/java-11-openjdk-amd64/bin/java       62MiB |
|    0   N/A  N/A     42236      G   ...irefox/3358/usr/lib/firefox/firefox      194MiB |
+---------------------------------------------------------------------------------------+

Our Launch file:

"""Launch a Node with parameters and remappings."""

from launch import LaunchDescription
from launch_ros.actions import Node, ComposableNodeContainer
from launch_ros.descriptions import ComposableNode

NETWORK_WIDTH = 960
NETWORK_HEIGHT = 576

def generate_launch_description():

    gstreamer_node = Node(
        package='gscam2',
        executable='gscam_main',
        output='screen',
        name='gscam_publisher',
        parameters=[
            {
                "camera_info_url": "",  # Camera calibration information
                "gscam_config": "rtpbin name=rtpbin udpsrc caps=\"application/x-rtp, media=(string)video, clock-rate=(int)90000, encoding-name=(string)H264\" port=5005 ! rtpbin.recv_rtp_sink_0 rtpbin. ! rtph264depay ! decodebin ! videoconvert ! video/x-raw,format=RGB",
                "preroll": False,
                "use_gst_timestamps": False,
            },
        ],
    )
    
    splitter_node = Node(
        package='custom_package',
        executable='stereo_splitter',
        parameters=[
            {'left_camera_file': '/workspaces/isaac_ros-dev/src/custom_package/config/dual_sensor_left.yaml',
             'right_camera_file': '/workspaces/isaac_ros-dev/src/custom_package/config/dual_sensor_right.yaml'}
        ]
    )

    disparity_visualizer_node = Node(
        package='custom_package',
        executable='disparity_visualizer',
    )
   
    left_rectify_node = ComposableNode(
        name='left_rectify_node',
        package='isaac_ros_image_proc',
        plugin='nvidia::isaac_ros::image_proc::RectifyNode',
        parameters=[{
            'output_width': 1280,
            'output_height': 720,
        }],
        remappings=[
            ('image_raw', '/left/image_raw'),
            ('camera_info', '/left/camera_info'),
            ('image_rect', '/left/image_rect'),
            ('camera_info_rect', '/left/camera_info_rect')
        ]
    )

    right_rectify_node = ComposableNode(
        name='right_rectify_node',
        package='isaac_ros_image_proc',
        plugin='nvidia::isaac_ros::image_proc::RectifyNode',
        parameters=[{
            'output_width': 1280,
            'output_height': 720,
        }],
        remappings=[
            ('image_raw', '/right/image_raw'),
            ('camera_info', '/right/camera_info'),
            ('image_rect', '/right/image_rect'),
            ('camera_info_rect', '/right/camera_info_rect')
        ]
    )

    left_image_resize_node = ComposableNode(
        name='left_image_resize_node',
        package='isaac_ros_image_proc',
        plugin='nvidia::isaac_ros::image_proc::ResizeNode',
        parameters=[{
                'output_width': NETWORK_WIDTH,
                'output_height': NETWORK_HEIGHT,
                'keep_aspect_ratio': False # Do not add black borders to image
        }],
        remappings=[
            ('camera_info', '/left/camera_info_rect'),
            ('image', '/left/image_rect'),
            ('resize/camera_info', '/left/camera_info_rect_resize'),
            ('resize/image', '/left/image_rect_resize')]
    )

    right_image_resize_node = ComposableNode(
        name='right_image_resize_node',
        package='isaac_ros_image_proc',
        plugin='nvidia::isaac_ros::image_proc::ResizeNode',
        parameters=[{
                'output_width': NETWORK_WIDTH,
                'output_height': NETWORK_HEIGHT,
                'keep_aspect_ratio': False # Do not add black borders to image
        }],
        remappings=[
            ('camera_info', '/right/camera_info_rect'),
            ('image', '/right/image_rect'),
            ('resize/camera_info', '/right/camera_info_rect_resize'),
            ('resize/image', '/right/image_rect_resize')]
    )

    disparity_node = ComposableNode(
        name='disparity',
        package='isaac_ros_ess',
        plugin='nvidia::isaac_ros::dnn_stereo_depth::ESSDisparityNode',
        parameters=[{
            'engine_file_path': "/workspaces/isaac_ros-dev/src/isaac_ros_dnn_stereo_depth/resources/ess.engine",
            "threshold": 0.0,
            "image_type": "RGB_U8",
            }],
        remappings=[
            ('left/camera_info', '/left/camera_info_rect_resize'),
            ('left/image_rect', '/left/image_rect_resize'),
            ('right/camera_info', '/right/camera_info_rect_resize'),
            ('right/image_rect', '/right/image_rect_resize')
        ]
    )

    nitros_container = ComposableNodeContainer(
        name='nitros_container',
        package='rclcpp_components',
        namespace='',
        executable='component_container_mt',
        composable_node_descriptions=[
            left_rectify_node, right_rectify_node,
            left_image_resize_node, right_image_resize_node,
            disparity_node
        ],
        output='screen'
    )

    return LaunchDescription([
                            gstreamer_node,
                            splitter_node,
                            nitros_container,
                            disparity_visualizer_node,
                            ])

stereo_splitter node:

import numpy as np
import cv2 as cv

import rclpy
import cv_bridge
from rclpy.node import Node
from sensor_msgs.msg import CameraInfo, Image

class StereoSplitter(Node):

    def __init__(self):
        super().__init__('stereo_splitter')
        self.encoding = 'rgb8'
        self._bridge = cv_bridge.CvBridge()
        self.read_calibration()
        self.setup_subscribers()
        self.setup_publishers()
    
    def yaml_to_CameraInfo(self, yaml_file: str):
        """
        Parse a yaml file containing camera calibration data (as produced by
        ROS or RTabMap) into a sensor_msgs/CameraInfo msg.

        Parameters
        ----------
        yaml_file : str
            Path to yaml file containing camera calibration data

        Returns
        -------
        camera_info_msg : sensor_msgs.msg.CameraInfo
            A sensor_msgs.msg.CameraInfo message containing the camera calibration data
        """
        camera_info_msg = CameraInfo()
        cv_file = cv.FileStorage()
        cv_file.open(yaml_file, cv.FileStorage_READ)
        camera_info_msg.width = int(cv_file.getNode('image_width').real())
        camera_info_msg.height = int(cv_file.getNode('image_height').real())
        camera_info_msg.k = cv_file.getNode('camera_matrix').mat().flatten().tolist()
        camera_info_msg.d = cv_file.getNode('distortion_coefficients').mat().flatten().tolist()
        camera_info_msg.r = cv_file.getNode('rectification_matrix').mat().flatten().tolist()
        camera_info_msg.p = cv_file.getNode('projection_matrix').mat().flatten().tolist()
        camera_info_msg.distortion_model = cv_file.getNode('distortion_model').string()
        cv_file.release()

        return camera_info_msg

    def read_calibration(self):
        self.declare_parameter('left_camera_file', '/workspaces/isaac_ros-dev/src/custom_package/config/dual_sensor_left.yaml')
        self.declare_parameter('right_camera_file', '/workspaces/isaac_ros-dev/src/custom_package/config/dual_sensor_right.yaml')
        left_cam_info_path = self.get_parameter('left_camera_file').get_parameter_value().string_value
        right_cam_info_path = self.get_parameter('right_camera_file').get_parameter_value().string_value
        self.left_camera_info = self.yaml_to_CameraInfo(left_cam_info_path)
        self.right_camera_info = self.yaml_to_CameraInfo(right_cam_info_path)

    def setup_subscribers(self):
        print('Setting up Subscribers...')
        self.subscription = self.create_subscription(
            Image,
            '/image_raw',
            self.stereo_img_recvd_callback,
            10)
        self.subscription #prevent unused variable warning

    def setup_publishers(self):
        print('Setting up Publishers...')
        self._img_left_pub = self.create_publisher(
            Image, '/left/image_raw', 10)
        self._img_right_pub = self.create_publisher(
            Image, '/right/image_raw', 10)
        self._camera_left_pub = self.create_publisher(
            CameraInfo, '/left/camera_info', 10)
        self._camera_right_pub = self.create_publisher(
            CameraInfo, '/right/camera_info', 10)

    def stereo_img_recvd_callback(self, image_msg):
        print('Received Stereo image!')
        split_images = self.split_stereo_image(self._bridge.imgmsg_to_cv2(image_msg))
        
        #Convert cv2 image to ros2 image message
        left_msg = self._bridge.cv2_to_imgmsg(np.array(split_images[0]), self.encoding)
        right_msg = self._bridge.cv2_to_imgmsg(np.array(split_images[1]), self.encoding)

        #Synchronize timestamps across all published topics
        current_timestamp = self.get_clock().now().to_msg()
        left_msg.header.stamp = current_timestamp
        right_msg.header.stamp = current_timestamp
        self.left_camera_info.header.stamp = current_timestamp
        self.right_camera_info.header.stamp = current_timestamp
        left_msg.header.frame_id = "stereocamera_left_optical_frame"
        right_msg.header.frame_id = "stereocamera_right_optical_frame"
        self.left_camera_info.header.frame_id = "stereocamera_left_optical_frame"
        self.right_camera_info.header.frame_id = "stereocamera_right_optical_frame"

        #Publish left/right images and left/right camera infos
        self._img_left_pub.publish(left_msg)
        self._camera_left_pub.publish(self.left_camera_info)
        self._img_right_pub.publish(right_msg)
        self._camera_right_pub.publish(self.right_camera_info)

    #Receives a cv2 image and returns two split cv2 images
    def split_stereo_image(self, stereoImage):
        print("Splitting Stereo image!")

        # Get height/width of original image
        height, width = stereoImage.shape[:2]
        horizontalCenter = int(width/2)

        # Generate sub images
        left_image = stereoImage[0:height, 0:horizontalCenter]
        right_image = stereoImage[0:height, horizontalCenter:width]  

        return left_image, right_image


def main(args=None):
    rclpy.init(args=args)
    stereo_splitter = StereoSplitter()
    rclpy.spin(stereo_splitter)

    #Cleanup
    stereo_splitter.destroy_node()
    rclpy.shutdown()

if __name__ == '__main__':
    main()

Hi,

I have also observed this issue on the laptop with an A2000 GPU. After running the above disparity pipeline for a while, it has reached a similar throttling state where both the rectification and disparity outputs are freezing/glitching.

As mentioned earlier, it seems the data flow between the non-NITROS nodes (stereo_splitter) and the first NITROS-enabled node which accepts the left/right images may be a performance bottleneck…

How could we fix the bottle necks in our pipeline?

Upon reviewing your code and documentation, it appears that the Python node may become a bottleneck. Let’s take a step-by-step approach to safely arrive at this conclusion while ensuring optimal performance.

I suggest recording the output from your stereo_splitter and saving it in a rosbag file. When is done, you can run your rosbag and your new pipeline will be like this one:
rosbag -> Isaac ROS Rectification -> Isaac ROS Resize -> Isaac ROS ESS StereoDNN Disparity

From this point, you can check now the performance of the Isaac ROS ESS StereoDNN Disparity

About your last post, we know there is a bottleneck between not-NITROS and nitro packages, but to reproduce the same output of our benchmark, I suggest following this document:
https://nvidia-isaac-ros.github.io/concepts/benchmarking/index.html#benchmarking

Keep me posted,
Raffaello

Hi @Raffaello,

I have tried recording the output from stereo_splitter and saving it in a rosbag file, as you suggested.

However, when playing back the rosbag, the performance of the disparity pipeline is similarly poor. After running the above pipeline for a while (~1.5min) , it reaches a similar throttling state where both the rectification and disparity outputs are freezing/glitching.

Have you tried our benchmark?

You can run our Isaac ROS benchmark. This repository provides benchmark scripts for our Isaac ROS packages.

You can test on your device the ESS Stereo DNN disparity node following this documentation
https://nvidia-isaac-ros.github.io/repositories_and_packages/isaac_ros_benchmark/isaac_ros_benchmark/index.html#quickstart

Instead of testing Apriltag, like the documentation shows, you can do all steps from points 1 to 4

5: Install Isaac ROS ess

sudo apt-get install -y ros-humble-isaac-ros-ess ros-humble-isaac-ros-test

6: Run these tests

launch_test src/isaac_ros_benchmark/scripts/isaac_ros_ess_graph.py
launch_test src/isaac_ros_benchmark/scripts/isaac_ros_ess_node.py

7: Read output

Check the output from the terminal.

Additionally, the final results and benchmark metadata (e.g., system information, benchmark configurations) are also exported as a JSON file.