On a CentOS 6.7 system with several K40c’s that has been working consistently for quite sometime is now failing. Attempts to use NVIDIA/CUDA apps crash the system causing a reboot, this happens by just running nvidia-smi or nvidia-bug-report.sh for example. The last and only thing sent to messages is;
kernel: NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
I have tried several standard kernels with several recent driver versions and the issue remains.
I have seen this error in searches with a variety of explanations, one being faulty hardware.
Would anyone have any suggestions on finding the cause of this issue and perhaps a way to resolve it?