CUDA out of memory need to reboot the server

kalman · November 25, 2009, 10:58pm

Hi,
it’s happening lately that I’m going in out of memory allocating memory on device.
This happen usually after interrupting a previous run of my executable. I’m on linux
using CUDA 2.3 with drivers 190.42.
when this happens doing “lsmod | grep nvidia” it seems someone is still using the
driver but X is down, and no other in the system is running cuda coda.
Is this a driver issue ?

Sarnath · November 26, 2009, 7:07am

How do u interrupt? CTRL^C or KILL -9 ??

kalman · November 26, 2009, 8:06am

Well usually I do a CTRL-C, but sometime having a segmentation fault,

the program exit with a KILL.

Sarnath · November 26, 2009, 9:21am

When are you facing this problem? Doing CTRL_C or KILL or When segmentation fault occurs???

Your answer is not clear.

kalman · November 26, 2009, 4:17pm

I’m not sure when (segmentation fault is a kill anyway) because I do not alloc 4GB at each

run so I guess each time that the executable doesn’t exit in a clean way it does some “leakage”,

and the effects are seen later on.

Tigga · November 26, 2009, 4:22pm

It may be a driver glitch. Failing that, it may be that your kernel has written out of bounds and broken something, which is leading to the error message.

You might be able to get away with simply restarting the driver, rather than the whole server.

kalman · November 26, 2009, 7:33pm

rmmod doesn’t work because it’s like the driver is still in use (lsmod | grep nvidia doesn’t shows 0).

In my application I only use cufft and cublas so not like is one of my kernels fault.

kalman · November 30, 2009, 12:59pm

I digged into it, and it seems that is my application fault, I have still to check but some time
after my application terminate (somehow) a “thread” remains alive, killing it the memory
is release correctly.

Topic		Replies	Views
memory fragmentation? CUDA Programming and Performance	2	4270	April 15, 2009
RTX 2060 CUDA issues in ubuntu CUDA Developer Tools	0	800	November 14, 2020
"out of memory" problem.. CUDA Programming and Performance	1	6463	May 9, 2007
Driver bug?! CUDA Driver stops working for specific program CUDA Programming and Performance	0	1580	March 1, 2010
CUDA and GPU memory How to get CUDA to exit cleanly when a routine demands too much memory CUDA Programming and Performance	2	7549	February 11, 2010
Memory shortage in Host Code CUDA Programming and Performance	3	3638	May 15, 2008
CUDA hangs on GPU but not in emulation CUDA Programming and Performance	7	5362	August 21, 2008
lmem overrun ? CUDA Programming and Performance	1	4659	February 24, 2011
Segmentation Fault, off and on CUDA Programming and Performance	1	711	September 18, 2011
Strange access to memory CUDA Programming and Performance	4	590	March 17, 2014

CUDA out of memory need to reboot the server

Related topics