Please provide complete information as applicable to your setup.
• Hardware Platform (Jetson / GPU) GPU
• DeepStream Version 6.2
• TensorRT Version 8.5.2.2
• NVIDIA GPU Driver Version (valid for GPU only) 525.125.06
• Issue Type( questions, new requirements, bugs)
I have a question regarding the output of nsight systems that was used to inspect deepstream. In the output for GstNvInfer components, I noticed there is some sort of concurency implemented in deepstream, as a second batch is started to process before the first batch is done, for example (see batch_num 2480, 2481, 2482 are being processed concurently):
c
However, zooming out of the output I noticed some gaps in the third line of GstNvInfer: UID=1
:
which means that less concurency is applied, I assume.
My questions:
- Why are there sometimes 3 concurrent processing lines for the
GstNvInfer: UID=1
component and other times 2 lines? Does this mean that other parts of the pipeline need more intense processing, so theGstNvInfer: UID=1
component gets less GPU time? - Does deepstream run on one process, multiple threads OR on multiple processes, multiple threads per process?
- How would you recommend looking for bottlenecks in the nsight output? In one tutorial, I found that if the
CUDA HW
part is a full line without any gaps, then it means the GPU is utilized fully and no optimizations can be made, which is my case:
Thank you in advance!