What is the equivalent of setting the --useCudaGraph flag of trtexec in deepstream 6.1?

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU): GeForce RTX 3090
• DeepStream Version: 6.1
• TensorRT Version: 8.2
• NVIDIA GPU Driver Version (valid for GPU only): 510
• Issue Type( questions, new requirements, bugs): question

I did some network benchmarking using trtexec and noticed that setting the --useCudaGraph flag increase the engine performance significantly. The results are shown bellow:

Throughput Enqueue Time H2D Latency GPU Compute Time
No flag 352.52 2.24 1.62 2.65
noDataTransfers 407.44 2.16 0.00 2.45
useCudaGraph 615.66 0.07 1.59 1.21
Both flags 848.53 0.09 0.00 1.17

All experiments used 1000 queries, and compute the averages every 10 queries (--warmUp=0 --duration=0 --iterations=1000 --avgRuns=10). Except for throughput (measured in queries per second), other measurements use the mean (measured in milliseconds).

I figure I can improve the pipeline performance by enabling Cuda Graph in deepstream 6.1 some how but I see no mention of it in nvinfer plugin documentation. Does deepstream support Cuda Graph?

The --useCudaGraph flag will enable CUDA Graph.
Developer Guide :: NVIDIA Deep Learning TensorRT Documentation

So you may try to add cuda Graph in nvinfer source code to check whether it will help to improve your inference.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.