Please provide complete information as applicable to your setup.
• Hardware Platform (Jetson / GPU): GeForce RTX 3090
• DeepStream Version: 6.1
• TensorRT Version: 8.2
• NVIDIA GPU Driver Version (valid for GPU only): 510
• Issue Type( questions, new requirements, bugs): question
I did some network benchmarking using trtexec
and noticed that setting the --useCudaGraph
flag increase the engine performance significantly. The results are shown bellow:
Throughput | Enqueue Time | H2D Latency | GPU Compute Time | |
---|---|---|---|---|
No flag | 352.52 | 2.24 | 1.62 | 2.65 |
noDataTransfers | 407.44 | 2.16 | 0.00 | 2.45 |
useCudaGraph | 615.66 | 0.07 | 1.59 | 1.21 |
Both flags | 848.53 | 0.09 | 0.00 | 1.17 |
All experiments used 1000 queries, and compute the averages every 10 queries (--warmUp=0 --duration=0 --iterations=1000 --avgRuns=10
). Except for throughput (measured in queries per second), other measurements use the mean (measured in milliseconds).
I figure I can improve the pipeline performance by enabling Cuda Graph in deepstream 6.1 some how but I see no mention of it in nvinfer
plugin documentation. Does deepstream support Cuda Graph?