We have system that analyze 10 rtsp stream parallel. And we have performance issue with Gst NvV4L2 decoder in JP 4.6.2 .
In Nsight System 2021.5.4 we noticed that CUDA execution context are interrupted during the decoding. So our first issue is that CUDA kernels are possibly ** not parallel ** with decoding tasks (but with encoding they are).
But it hard to notice because NvV4L2 decoding tasks are missing from Nsight System trace. Same happens in gst-lunch example code. So this is the second issue.
CUDA_INJECTION64_PATH="/opt/nvidia/nsight_systems/libToolsInjection64.so" LD_PRELOAD="/opt/nvidia/nsight_systems/libToolsInjectionProxy64.so" QUADD_INJECTION_PROXY="cuDNN, cuBLAS, NvMedia" gst-launch-1.0 -v rtspsrc location="any_kind_of_rtsp" ! rtph264depay ! queue ! nvv4l2decoder enable-max-performance=1 ! nvvidconv ! nvv4l2h265enc maxperf-enable=1 bitrate=8000000 ! fakesink
I hope you can say something for this. We will try to reproduce our first issue in example code.