User job crashes RHEL 7

System running RHEL7 crashes sometimes when a user runs a job. RedHat support said to contact you. I have tried several driver updates but it has not fixed the issue. The RHEL kernel should never crash because of a user job. My theory is that there is a bug in the driver that is being encountered by something in the user program. All the jobs that exercise the GPU run fine. RedHat support wants us to use the nouveau drivers because they support that. You support the nvidia drivers so we are contacting you. We can send the sosreport or vmcores if you need them.

Linux bhg0044 3.10.0-1160.6.1.el7.x86_64
GeForce RTX 2080 Ti
Driver Version: 460.67 CUDA Version: 11.2

Thanks,
Carl

What kind of “job”? Please run nvidia-bug-report.sh as root after crash happened and attach the resulting nvidia-bug-report.log.gz file to your post.

The machine crashed again. Here is the nvidia-bug-report.log.gz.
nvidia-bug-report.log.gz (4.2 MB)

Please create /etc/X11/xorg.conf

Section "Device"
  Identifier "ASPEED"
  Driver "modesetting"
  BusID "PCI:4:0:0"
EndSection

and configure nvidia-persistenced to start on boot, make sure it is continuously running and check if that resolves the issue.