Linux after 520.56.06 drivers randomly segfault nvenc inside nvcuvid thread

Hi,

on linux(ubuntu 22.04, gentoo) after 520.56.06 nvidia drivers my custom nvenc solution randomly segfault probaly on fast sessions, like test cases in ci. I check 525.125.06,535.54.03,535.86.05 cuda 12.2, all broken on my case. In gdb looks always same bt place. And i partial reproduce problem without segfault, but with valgrind warning looks same. Attached patch for Video_Codec_SDK_12.1.14, with example, she in thread loop create nvenc contexts and free its after random 100…300ms time. Reproduce
valgrind --trace-children=yes --leak-check=full --log-file=valgrind.txt AppEncode/AppEncCudaBug/AppEncCudaBug

AppEncCudaBug.patch (10.1 KB)

valgrind:

==964328== Invalid read of size 4
==964328==    at 0x6A16D81: ??? (in /usr/lib64/libnvcuvid.so.535.86.05)
==964328==    by 0x6A16EB9: ??? (in /usr/lib64/libnvcuvid.so.535.86.05)
==964328==    by 0x6A85835: ??? (in /usr/lib64/libnvcuvid.so.535.86.05)
==964328==    by 0x6A85F9C: ??? (in /usr/lib64/libnvcuvid.so.535.86.05)
==964328==    by 0x78282DB: start_thread (pthread_create.c:444)
==964328==    by 0x78AB69F: clone (clone.S:100)
==964328==  Address 0xc28024c is 134,156 bytes inside a block of size 230,056 free'd
==964328==    at 0x484310E: free (vg_replace_malloc.c:974)
==964328==    by 0x660BD4D: ??? (in /usr/lib64/libnvidia-encode.so.535.86.05)
==964328==    by 0x6605759: ??? (in /usr/lib64/libnvidia-encode.so.535.86.05)
==964328==    by 0x661E7CD: ??? (in /usr/lib64/libnvidia-encode.so.535.86.05)
==964328==    by 0x1321B1: NvEncoder::DestroyHWEncoder() (in /home/hizel/src/Video_Codec_SDK_12.1.14.bug/Samples/build/AppEncode/AppEncCudaBug/AppEncCudaBug)
==964328==    by 0x12F651: NvEncoder::~NvEncoder() (in /home/hizel/src/Video_Codec_SDK_12.1.14.bug/Samples/build/AppEncode/AppEncCudaBug/AppEncCudaBug)
==964328==    by 0x13F3EF: NvEncoderCuda::~NvEncoderCuda() (in /home/hizel/src/Video_Codec_SDK_12.1.14.bug/Samples/build/AppEncode/AppEncCudaBug/AppEncCudaBug)
==964328==    by 0x13F40B: NvEncoderCuda::~NvEncoderCuda() (in /home/hizel/src/Video_Codec_SDK_12.1.14.bug/Samples/build/AppEncode/AppEncCudaBug/AppEncCudaBug)
==964328==    by 0x115755: main::{lambda()#1}::operator()() const (in /home/hizel/src/Video_Codec_SDK_12.1.14.bug/Samples/build/AppEncode/AppEncCudaBug/AppEncCudaBug)
==964328==    by 0x115DF5: void std::__invoke_impl<void, main::{lambda()#1}&>(std::__invoke_other, main::{lambda()#1}&) (in /home/hizel/src/Video_Codec_SDK_12.1.14.bug/Samples/build/AppEncode/AppEncCudaBug/AppEncCudaBug)
==964328==    by 0x115CE9: std::enable_if<std::__and_<std::is_void<void>, std::__is_invocable<main::{lambda()#1}&> >::value, void>::type std::__invoke_r<void, main::{lambda()#1}&>(main::{lambda()#1}&) (in /home/hizel/src/Video_Codec_SDK_12.1.14.bug/Samples/build/AppEncode/AppEncCudaBug/AppEncCudaBug)
==964328==    by 0x115BCD: std::_Function_handler<void (), main::{lambda()#1}>::_M_invoke(std::_Any_data const&) (in /home/hizel/src/Video_Codec_SDK_12.1.14.bug/Samples/build/AppEncode/AppEncCudaBug/AppEncCudaBug)
==964328==  Block was alloc'd at
==964328==    at 0x4840797: malloc (vg_replace_malloc.c:431)
==964328==    by 0x6605958: ??? (in /usr/lib64/libnvidia-encode.so.535.86.05)
==964328==    by 0x661C8CC: ??? (in /usr/lib64/libnvidia-encode.so.535.86.05)
==964328==    by 0x13168C: NvEncoder::CreateEncoder(_NV_ENC_INITIALIZE_PARAMS const*) (in /home/hizel/src/Video_Codec_SDK_12.1.14.bug/Samples/build/AppEncode/AppEncCudaBug/AppEncCudaBug)
==964328==    by 0x1156D6: main::{lambda()#1}::operator()() const (in /home/hizel/src/Video_Codec_SDK_12.1.14.bug/Samples/build/AppEncode/AppEncCudaBug/AppEncCudaBug)
==964328==    by 0x115DF5: void std::__invoke_impl<void, main::{lambda()#1}&>(std::__invoke_other, main::{lambda()#1}&) (in /home/hizel/src/Video_Codec_SDK_12.1.14.bug/Samples/build/AppEncode/AppEncCudaBug/AppEncCudaBug)
==964328==    by 0x115CE9: std::enable_if<std::__and_<std::is_void<void>, std::__is_invocable<main::{lambda()#1}&> >::value, void>::type std::__invoke_r<void, main::{lambda()#1}&>(main::{lambda()#1}&) (in /home/hizel/src/Video_Codec_SDK_12.1.14.bug/Samples/build/AppEncode/AppEncCudaBug/AppEncCudaBug)
==964328==    by 0x115BCD: std::_Function_handler<void (), main::{lambda()#1}>::_M_invoke(std::_Any_data const&) (in /home/hizel/src/Video_Codec_SDK_12.1.14.bug/Samples/build/AppEncode/AppEncCudaBug/AppEncCudaBug)
==964328==    by 0x1242EB: std::function<void ()>::operator()() const (in /home/hizel/src/Video_Codec_SDK_12.1.14.bug/Samples/build/AppEncode/AppEncCudaBug/AppEncCudaBug)
==964328==    by 0x1152D2: ThreadPool::ThreadLoop() (in /home/hizel/src/Video_Codec_SDK_12.1.14.bug/Samples/build/AppEncode/AppEncCudaBug/AppEncCudaBug)
==964328==    by 0x12E6C7: void std::__invoke_impl<void, void (ThreadPool::*)(), ThreadPool*>(std::__invoke_memfun_deref, void (ThreadPool::*&&)(), ThreadPool*&&) (in /home/hizel/src/Video_Codec_SDK_12.1.14.bug/Samples/build/AppEncode/AppEncCudaBug/AppEncCudaBug)
==964328==    by 0x12E626: std::__invoke_result<void (ThreadPool::*)(), ThreadPool*>::type std::__invoke<void (ThreadPool::*)(), ThreadPool*>(void (ThreadPool::*&&)(), ThreadPool*&&) (in /home/hizel/src/Video_Codec_SDK_12.1.14.bug/Samples/build/AppEncode/AppEncCudaBug/AppEncCudaBug)

gdb:

#0  0x00007f78dfa16d81 in ?? () from /usr/lib64/libnvcuvid.so.1
#1  0x00007f78dfa16eba in ?? () from /usr/lib64/libnvcuvid.so.1
#2  0x00007f78dfa85806 in ?? () from /usr/lib64/libnvcuvid.so.1
#3  0x00007f78dfa85f6d in ?? () from /usr/lib64/libnvcuvid.so.1
#4  0x00007f79d568e2dc in start_thread (arg=<optimized out>) at pthread_create.c:444
#5  0x00007f79d571178c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

      0x7f78dfa00000     0x7f78e02f7000   0x8f7000        0x0 /usr/lib64/libnvcuvid.so.535.54.03
      0x7f78e02f7000     0x7f78e04f6000   0x1ff000   0x8f7000 /usr/lib64/libnvcuvid.so.535.54.03
      0x7f78e04f6000     0x7f78e053f000    0x49000   0x8f6000 /usr/lib64/libnvcuvid.so.535.54.03
      0x7f78e053f000     0x7f78e0540000     0x1000   0x93f000 /usr/lib64/libnvcuvid.so.535.54.03