A100 Xid Error (decode 4k hdr 10bit hlg)

I find A100 Xid error, when decode 4k hdr 10bit.This problem only exists on A100, we have tested that A16 does not have this problem

  1. start multiple(15) ffmpeg decoding tasks at the same time
    for ((i=0;i<15;i++));do echo $i;(nohup ffmpeg -re -stream_loop -1 -hwaccel cuda -c:v hevc_cuvid -hwaccel_output_format cuda -i 032_migu_4k-1min.ts -r 25 -vf 'scale_cuda=1920:1080' -f null -> /dev/null 2>&1 &);sleep 3;done

  2. run for 5 minutes to 1 hour
    2.1 check cuda usage through nvidia-smi, cuda and decoder both become 0
    2.2 /var/log/message have xid error

Mar 23 17:12:32 localhost kernel: NVRM: Xid (PCI:0000:18:00): 31, pid=57565, name=ffmpeg, Ch 00000021, intr 00000000. MMU Fault: ENGINE GRAPHICS GPCCLIENT_T1_2 faulted @ 0x0_00000000. Fault is of type FAULT_PDE ACCESS_TYPE_VIRT_READ
Mar 23 17:36:52 localhost kernel: NVRM: Xid (PCI:0000:18:00): 31, pid=230328, name=ffmpeg, Ch 00000074, intr 00000000. MMU Fault: ENGINE GRAPHICS GPCCLIENT_GCC faulted @ 0x100_01a40000. Fault is of type FAULT_PDE ACCESS_TYPE_VIRT_READ
Mar 23 17:42:02 localhost kernel: NVRM: Xid (PCI:0000:18:00): 31, pid=47467, name=ffmpeg, Ch 00000040, intr 00000000. MMU Fault: ENGINE GRAPHICS GPCCLIENT_GCC faulted @ 0x100_01a40000. Fault is of type FAULT_PDE ACCESS_TYPE_VIRT_READ
Mar 23 17:56:13 localhost kernel: NVRM: Xid (PCI:0000:18:00): 62, pid=‘’, name=, 10619(6c8c) 0100260f 1400d040

  1. only reboot os can restore

  2. environment
    OS:centos 7.6 x86-64
    CPU: Intel(R) Xeon(R) Gold 6258R CPU @ 2.70GHz
    GPU: A100-40G
    NVIDIA Driver Version: 515.86.01
    CUDA Version: 11.7

Thanks for your reply.
032_migu_4k-1min.ts (80.8 MB)