NVDEC fails on file: requires (29 + threads) surfaces; thus fails on >3 threads

Please, Nvidia and its devs, help us fix the bug in ffmpeg, since 4 core CPU already gives by default 5 threads and fails. We have a file that requires at least 30 surfaces for 1 thread. But for 5 threads it is 34 surfaces and it fails. There are other files that fail with 10 threads and this on newer CPUs will cause big problems.

The proposed patch is [FFmpeg-devel] libavcodec/nvdec: Do not exceed 32 surfaces when initializing hw_frames_ctx - Patchwork

but that has some problems:

So far, the 32 surface limit has only ever been enforced as a soft
limit, and a warning printed when exceeded. Since it being a limit is
not strictly documented and it might be risen in the future.

Old cuviddec (ffmpeg.exe -c:v h264_cuvid -i example.mp4 -f null -) is not affected.


sample only https://trac.ffmpeg.org/raw-attachment/ticket/8948/example.mp4


ffmpeg.exe -hwaccel cuda -i example.mp4" -f null -

leads to

[h264 @ 000001c962e62040] decoder->cvdl->cuvidCreateDecoder(&decoder->decoder, params) failed → CUDA_ERROR_INVALID_VALU
E: invalid argument
[h264 @ 000001c962e62040] Using more than 32 (34) decode surfaces might cause nvdec to fail.
[h264 @ 000001c962e62040] Try lowering the amount of threads. Using 5 right now.
[h264 @ 000001c962e62040] Failed setup for format cuda: hwaccel initialisation returned error.

Hi @val.zapod.vz,

Thank you for bringing this up.

Can you clarify whether this is a limitation in how ffmpeg implements CUDA usage or do you think it is a problem with CUDA or our Video SDK?

Timo Rothenpieler is Nvidia developer, and even he does not know. He says it is soft limit and “might be risen in the future” and in fact if you read the patchwork it says that

CUVID based wrappers expose it as an option (via -surfaces) as
does the NVENC encoder wrapper implementation.

The fact is reselecting threads in runtime is needed.

THE FACT is 32 thread CPUs can fail for a lot of simple files, that is horrible.

This file has 120fps 4k AVC and it works perfect in NVDEC chip, yet on threads 10 it will fail. That means any new CPU will be useless.

Any updates? Your latest commit just broke more stuff: #10409 (Cuvid decoder with bob deint errror after #402d98c commit) – FFmpeg