NVIDIA Developer Forums

GPU breaks down after error

Accelerated Computing CUDA CUDA Programming and Performance

nolpec November 3, 2010, 7:12am 1

hi all,

i have been trying to port my code to CUDA, and in the process i have to debug.

whenever my programs fails to run properly for a few times, i found that the GPU freezes up. i can still compile, but upon running the executable i get different but similar error messages each time it freezes up:

“could not allocate device memory”
“no CUDA-capable device is detected”

more recently,

“unspecific launch failure”, and after that the programs just seems to be stuck every time i try to execute it. using nvidia-smi, i can see that the gpu is at 100%, but no memory is occupied. usually the program should take up at least 30-40% memory.

my solution has been to restart the workstation. but since other users maybe using it, it is not a convenient solution.

so, is there a way ( a command maybe ), to restart just the gpu? or better yet, “re-initialize” the GPU?

i am using c1060 on a kde linux workstation, latest version of the driver and compiler.

Thanks!

nolpec November 3, 2010, 7:12am 2

hi all,

i have been trying to port my code to CUDA, and in the process i have to debug.

whenever my programs fails to run properly for a few times, i found that the GPU freezes up. i can still compile, but upon running the executable i get different but similar error messages each time it freezes up:

“could not allocate device memory”
“no CUDA-capable device is detected”

more recently,

“unspecific launch failure”, and after that the programs just seems to be stuck every time i try to execute it. using nvidia-smi, i can see that the gpu is at 100%, but no memory is occupied. usually the program should take up at least 30-40% memory.

my solution has been to restart the workstation. but since other users maybe using it, it is not a convenient solution.

so, is there a way ( a command maybe ), to restart just the gpu? or better yet, “re-initialize” the GPU?

i am using c1060 on a kde linux workstation, latest version of the driver and compiler.

Thanks!

Topic		Replies	Views	Activity
GPU breaks down after error CUDA Programming and Performance	3	10798	November 16, 2010
how to reset device? CUDA Programming and Performance	5	5925	June 3, 2009
<1% Free Memory (Reset Programmatically? CUDA Programming and Performance	3	1140	June 7, 2009
manual release of GPU memory how to fix memory leak problems when relaunching the job CUDA Programming and Performance	1	2442	March 2, 2012
is there any easy ways to reset GPU CUDA app hang up CUDA Programming and Performance	7	3774	November 20, 2008
Reset GPU withour restart after CUDA crash CUDA Programming and Performance	4	3412	May 16, 2013
how to effectively free large memory allocation CUDA Programming and Performance	8	7996	November 5, 2015
GPU in a bad state - only power cycle helps CUDA Programming and Performance	6	2427	March 24, 2011
Cuda samples fail to allocate memory after running a few pieces of code CUDA Setup and Installation	5	2959	September 6, 2016
stuck CUDA program how to restart GPU when CUDA gets stuck CUDA Programming and Performance	0	1360	August 10, 2010