V100s crashe after 1 hour of load

Attaching bug report:

Seems to be power control issue?
Also when running diagnostic command using data center tool and set it to level 3 it crashes instantly

To whoever comes across the same issue D3 power management,
This seems to solve the issue, I ran the load again after the workaround for about 1 hour 20 minutes without issue

found here:
https://bbs.archlinux.org/viewtopic.php?id=297276

Update: only workaround for me now was to add line in

/etc/modprobe.d/nvidia.conf
options nvidia "NVreg_EnableGpuFirmware=0"

in order to disable GSP as seen here

You’ll need to:

sudo mkinitcpio -P

or for RHEL users

sudo dracut -f

and

sudo grub-mkconfig -o /boot/grub/grub.cfg

in my case it still shows

Video Memory Self Refresh: Not Supported

but it didnt crash