P4 and T4 Decoding on Windows Server 2016 - Low utilization and frame rate

OS: Windows server 2016
GPU: Tesla T4 and Tesla P4
Drivers: 411.98 and 412.36

We have ffmpeg with cuvid decoder enabled (the problem reproduces with other GPU decoders as well).
We run this command:

ffmpeg -c:v h264_cuvid -i <video file> -f null –

and observed the decoder utilization using this command

nvidia-smi.exe -q -l 1 | FINDSTR Decoder

Testing on a public video from:
The video is very short - create a video with x4 loop -

ffmpeg.exe -c:v h264_cuvid -stream_loop 4 -i video.mp4 video_loopX4.mp4

Comparing 2 driver versions:
Tesla P4:
Driver 411.98: ~358 fps, 87% decoder utilization
Driver 412.36: ~355 fps, 88% decoder utilization
Tesla T4:
Driver 411.98: ~434 fps, 34% decoder utilization
Driver 412.36: ~194 fps, 29% decoder utilization

Running the same test on Linux we are able to achieve 100% (P4) / 50% (T4) decoder utilization and much higher decode frame rate.

Tesla T4 - Linux (Ubuntu 18.04.1 LTS)
Driver 415.27: ~620 fps, 50% decoder utilization

Actual fps and decoder utilization vary when testing different input videos, but both GPUs are never able to achieve their decoding potential seen on Linux when using Windows server.
update: Tesla T4 decodes in lower fps after driver update.

Hi Tamir,
Thanks for providing detailed information.
We are looking into this issue and get back to you if need any more details.


Any updates on this issue?


Can you test with the latest driver available on nvidia.com (https://www.nvidia.com/Download/index.aspx?lang=en-us) and confirm if the issue is fixed?
For reference, this is tracked internally as 200538703.