Is there a way to reset a GPU?... ...that is, without rebooting Linux

Is there a way to reset a GPU without rebooting the host?

My kernels were running correctly until I attempted to run an instance with an unusually large number of threads. After that, all hell broke loose and any new attempts to run the same kernels resulted in many strange and inexplicable errors. We rebooted the host, and then everything worked correctly again.

It would be nice if one could reset the GPU from the host code, or at least from the command prompt as a (non-super)user.

I am using a GTX480 card on

x86_64 Red Hat Enterprise Linux Client release 5.4 (Tikanga)

Nvidia driver version 256.40

The Cuda toolkit I downloaded was cudatoolkit_3.1_linux_64_rhel5.4.run

Is there a way to reset a GPU without rebooting the host?

My kernels were running correctly until I attempted to run an instance with an unusually large number of threads. After that, all hell broke loose and any new attempts to run the same kernels resulted in many strange and inexplicable errors. We rebooted the host, and then everything worked correctly again.

It would be nice if one could reset the GPU from the host code, or at least from the command prompt as a (non-super)user.

I am using a GTX480 card on

x86_64 Red Hat Enterprise Linux Client release 5.4 (Tikanga)

Nvidia driver version 256.40

The Cuda toolkit I downloaded was cudatoolkit_3.1_linux_64_rhel5.4.run

If you have root, you can try to reload the nvidia driver.

If you have root, you can try to reload the nvidia driver.

And if it’s reproducible, and a simple kernel, you could post a bug to NV…

And if it’s reproducible, and a simple kernel, you could post a bug to NV…

I would, but it is not a simple kernel. It is a sequence of 4 kernels with a substantial amount of support code.

I would, but it is not a simple kernel. It is a sequence of 4 kernels with a substantial amount of support code.