I find A100 Xid error, when decode 4k hdr 10bit.This problem only exists on A100, we have tested that A16 does not have this problem
-
start multiple(15) ffmpeg decoding tasks at the same time
for ((i=0;i<15;i++));do echo $i;(nohup ffmpeg -re -stream_loop -1 -hwaccel cuda -c:v hevc_cuvid -hwaccel_output_format cuda -i 032_migu_4k-1min.ts -r 25 -vf 'scale_cuda=1920:1080' -f null -> /dev/null 2>&1 &);sleep 3;done -
run for 5 minutes to 1 hour
2.1 check cuda usage through nvidia-smi, cuda and decoder both become 0
2.2 /var/log/message have xid error
Mar 23 17:12:32 localhost kernel: NVRM: Xid (PCI:0000:18:00): 31, pid=57565, name=ffmpeg, Ch 00000021, intr 00000000. MMU Fault: ENGINE GRAPHICS GPCCLIENT_T1_2 faulted @ 0x0_00000000. Fault is of type FAULT_PDE ACCESS_TYPE_VIRT_READ
Mar 23 17:36:52 localhost kernel: NVRM: Xid (PCI:0000:18:00): 31, pid=230328, name=ffmpeg, Ch 00000074, intr 00000000. MMU Fault: ENGINE GRAPHICS GPCCLIENT_GCC faulted @ 0x100_01a40000. Fault is of type FAULT_PDE ACCESS_TYPE_VIRT_READ
Mar 23 17:42:02 localhost kernel: NVRM: Xid (PCI:0000:18:00): 31, pid=47467, name=ffmpeg, Ch 00000040, intr 00000000. MMU Fault: ENGINE GRAPHICS GPCCLIENT_GCC faulted @ 0x100_01a40000. Fault is of type FAULT_PDE ACCESS_TYPE_VIRT_READ
Mar 23 17:56:13 localhost kernel: NVRM: Xid (PCI:0000:18:00): 62, pid=‘’, name=, 10619(6c8c) 0100260f 1400d040
-
only reboot os can restore
-
environment
OS:centos 7.6 x86-64
CPU: Intel(R) Xeon(R) Gold 6258R CPU @ 2.70GHz
GPU: A100-40G
NVIDIA Driver Version: 515.86.01
CUDA Version: 11.7
Thanks for your reply.
032_migu_4k-1min.ts (80.8 MB)