I’m doing some performance analysis and it appears that using the HW accelerated video-decoder (via gstreamer) while using the GPU (via CUDA/Tensorflow) causes a significant drop in GPU throughput. Specifically, I see no problems decoding my 4k video source with minimal CPU usage, and my GPU workload is never blocked waiting for frames being decoded. However, if I eliminate the video-decoding & synthesize ‘fake frames’, I see a ~40 pct increase in GPU throughput. I realize there are a lot of moving parts (memory controller bandwidth, DMA, context-switching, etc) - however, I’d like to know whether the GPU HW is actually utilized by video-decoding, or if I am seeing the result of sharing some other resource (DMA, etc)?
Can you share steps on r28.1/TX2 so that we can reproduce the issue and do further analysis?
I discovered my mistake. For posterity, the problem I was seeing was related to accidentally doing sw-based color conversion from YUV to RGBA in the one case vs the other.