i have been trying to port my code to CUDA, and in the process i have to debug.
whenever my programs fails to run properly for a few times, i found that the GPU freezes up. i can still compile, but upon running the executable i get different but similar error messages each time it freezes up:
“could not allocate device memory”
“no CUDA-capable device is detected”
“unspecific launch failure”, and after that the programs just seems to be stuck every time i try to execute it. using nvidia-smi, i can see that the gpu is at 100%, but no memory is occupied. usually the program should take up at least 30-40% memory.
my solution has been to restart the workstation. but since other users maybe using it, it is not a convenient solution.
so, is there a way ( a command maybe ), to restart just the gpu? or better yet, “re-initialize” the GPU?
i am using c1060 on a kde linux workstation, latest version of the driver and compiler.