Is there a way to reset a GPU without rebooting the host?
My kernels were running correctly until I attempted to run an instance with an unusually large number of threads. After that, all hell broke loose and any new attempts to run the same kernels resulted in many strange and inexplicable errors. We rebooted the host, and then everything worked correctly again.
It would be nice if one could reset the GPU from the host code, or at least from the command prompt as a (non-super)user.
I am using a GTX480 card on
x86_64 Red Hat Enterprise Linux Client release 5.4 (Tikanga)
Nvidia driver version 256.40
The Cuda toolkit I downloaded was cudatoolkit_3.1_linux_64_rhel5.4.run