System running RHEL7 crashes sometimes when a user runs a job. RedHat support said to contact you. I have tried several driver updates but it has not fixed the issue. The RHEL kernel should never crash because of a user job. My theory is that there is a bug in the driver that is being encountered by something in the user program. All the jobs that exercise the GPU run fine. RedHat support wants us to use the nouveau drivers because they support that. You support the nvidia drivers so we are contacting you. We can send the sosreport or vmcores if you need them.
Linux bhg0044 3.10.0-1160.6.1.el7.x86_64
GeForce RTX 2080 Ti
Driver Version: 460.67 CUDA Version: 11.2
Thanks,
Carl