Hello Everyone,
We are seeing performance degradation in h246 video decoding when upgrading from NVidia v431.70 to v537.42 drivers. Running on a Quadro RTX 5000, and Windows 10 21H2.
Of note:
A. The time taken for cuvidParseVideoData
starts off running well (sub ms), but then starts to slowly degrade linearly up to 10s of ms per call. This excludes any time spent in callbacks.
Compared to v431.70 drivers, where cuvidParseVideoData
will happily run sub ms.
B. The time taken to copy the decoded frame buffers (1080p NV12) to a CUdeviceptr
with cuvidMapVideoFrame
and cuMemcpy2D
, is taking 10-20ms on v431.70, while taking only ~1ms on v431.70.
Does anyone know of anything that’s changed in the drivers that might affect this?
The decoding process roughly follows:
-
main:
cuMemAlloc
target_buffers
-
Decode thread per video stream (x7):
cuvidParseVideoData
(linear growth)- pfnDecodePicture callback
cuvidDecodePicture
(constant time)
- pfnDisplayPicture callback
- push CUVIDPARSERDISPINFO
- pfnDecodePicture callback
- pop CUVIDPARSERDISPINFO
- copy to CUdeviceptr (order of magnitude slower)
cuvidMapVideoFrame
CUVIDPARSERDISPINFOcuMemcpy2D
Y → target_buffercuMemcpy2D
UV → target_buffercuvidUnmapVideoFrame
Thanks in advance,
~Edgar