Profiling CPU/GPU Usage in ROS 2 Camera Node on Jetson

I’m posting on behalf of a ROS 2 development effort targeting Jetson AGX Orin (MILBOARD-AGX), using GStreamer for real-time camera processing.

We’re building a camera_publisher node with multiple appsink branches (raw, dewarped, compressed), and we’re running into significant CPU overhead as well as pipeline configuration issues that prevent full GPU acceleration.
Our current pipline:
pipeline << "nvv4l2camerasrc device=/dev/video0 ! "
<< "nvvideoconvert ! "
<< "video/x-raw(memory:NVMM),format=RGBA,width=1920,height=1080 ! "
<< “nvdewarper config-file=” << device_rectified_path_ << " ! "
<< "nvvideoconvert ! video/x-raw(memory:NVMM),format=NV12 ! "
<< "nvvidconv ! video/x-raw,format=BGRx ! "
<< "videoconvert ! video/x-raw,format=BGR ! "
<< "tee name=t ";
// RAW branch
pipeline << "t. ! queue ! videoconvert ! video/x-raw,format=BGR ! appsink name=raw_sink emit-signals=false sync=false ";
// Dewarped branch
pipeline << "t. ! queue ! videoconvert ! video/x-raw,format=BGR ! appsink name=dewarped_sink emit-signals=false sync=false ";
// Compressed branch
pipeline << "t. ! queue ! videoconvert ! x264enc tune=zerolatency bitrate=500 speed-preset=ultrafast ! "
<< "video/x-h264,profile=baseline ! appsink name=compressed_sink emit-signals=false sync=false ";
Frames are retrieved using gst_app_sink_try_pull_sample() and published via cv_bridge.

High CPU Usage

We observe CPU usage exceeding 170%, even with NVMM and hardware encoders.

How do we profile performance on Jetson to identify what’s consuming so much CPU in our pipeline? Any recommended tools (TegraStats, nvprof, Nsight)?

1 Like

Hello,

Thanks for visiting the NVIDIA Developer forums.

Your topic will be best served in the Jetson category, I have moved this post for better visibility.

Cheers,
Tom

Hi,
Please run sudo tegrastats. It shows status of each hardware engines such as CPU, GPU, NVENC.

It ix expected to have significant CPU usage in the pipeline since NVMM buffers are converted and copied to CPU buffers. RAW and Dewarped branches need the conversion to send CPU buffers in BGR to appsink. For Compressed branch, you can use hardware encoder and send NVMM buffer to nvv4l2h264enc directly like:

... ! nvvideoconvert ! video/x-raw(memory:NVMM),format=NV12 ! nvv4l2h264enc ! h264parse ! appsink 

Is there a way to is NVBufSurface to minimize CPU usage?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.