GPU breaks down after error

hi all,

i have been trying to port my code to CUDA, and in the process i have to debug.

whenever my programs fails to run properly for a few times, i found that the GPU freezes up. i can still compile, but upon running the executable i get different but similar error messages each time it freezes up:

“could not allocate device memory”
“no CUDA-capable device is detected”

more recently,

“unspecific launch failure”, and after that the programs just seems to be stuck every time i try to execute it. using nvidia-smi, i can see that the gpu is at 100%, but no memory is occupied. usually the program should take up at least 30-40% memory.

my solution has been to restart the workstation. but since other users maybe using it, it is not a convenient solution.

so, is there a way ( a command maybe ), to restart just the gpu? or better yet, “re-initialize” the GPU?

i am using c1060 on a kde linux workstation, latest version of the driver and compiler.

Thanks!

hi all,

i have been trying to port my code to CUDA, and in the process i have to debug.

whenever my programs fails to run properly for a few times, i found that the GPU freezes up. i can still compile, but upon running the executable i get different but similar error messages each time it freezes up:

“could not allocate device memory”
“no CUDA-capable device is detected”

more recently,

“unspecific launch failure”, and after that the programs just seems to be stuck every time i try to execute it. using nvidia-smi, i can see that the gpu is at 100%, but no memory is occupied. usually the program should take up at least 30-40% memory.

my solution has been to restart the workstation. but since other users maybe using it, it is not a convenient solution.

so, is there a way ( a command maybe ), to restart just the gpu? or better yet, “re-initialize” the GPU?

i am using c1060 on a kde linux workstation, latest version of the driver and compiler.

Thanks!