GPU is lost error - Tesla T4


I got the following error on a production server with RedHat 7.7 and Tesla T4:
Unable to determine the device handle for GPU 0000:17:00.0: GPU is lost. Reboot the system to recover this GPU

After reboot, the GPU is accessible again but I am trying to investigate the root cause of this. Could you please help me with that?
I am attaching the bug report here.
nvidia-bug-report (1).log (1.5 MB)

Thank you!