11 GB of GPU RAM used, and no process listed by nvidia-smi

Franck_Dernoncourt · August 18, 2016, 7:40pm

In my GPU #0, 11341MiB of GPU RAM is used, and no process is listed by nvidia-smi. How is that possible, and how can I get my memory back?

Thu Aug 18 14:27:58 2016
+------------------------------------------------------+
| NVIDIA-SMI 352.63     Driver Version: 352.63         |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX TIT...  Off  | 0000:02:00.0     Off |                  N/A |
| 29%   61C    P2    71W / 250W |  11341MiB / 12287MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX TIT...  Off  | 0000:03:00.0     Off |                  N/A |
| 22%   42C    P0    71W / 250W |     23MiB / 12287MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX TIT...  Off  | 0000:82:00.0     Off |                  N/A |
| 22%   35C    P0    69W / 250W |     23MiB / 12287MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  GeForce GTX TIT...  Off  | 0000:83:00.0     Off |                  N/A |
|  0%   33C    P0    60W / 250W |     23MiB / 12287MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

I had launched a Theano Python script with a lib.cnmem=0.9 flag, which explains why it used 11341MiB of GPU memory (the CNMeM library is a “simple library to help the Deep Learning frameworks manage CUDA memory.”.). However, I killed the script, and was expecting the GPU memory to get released. pkill -9 python did not help.

I use a GeForce GTX Titan Maxwell with Ubuntu 14.04.4 LTS x64.

Robert_Crovella · August 18, 2016, 7:52pm

It’s probably the result of a corrupted context on the GPU, perhaps associated with your killed script.

you can try using nvidia-smi to reset the GPUs. If that doesn’t work, reboot the server.

Franck_Dernoncourt · August 18, 2016, 8:33pm

Thanks, following your comment I tried

sudo nvidia-smi --gpu-reset -i 0

but it didn’t work:

Unable to reset this GPU because it’s being used by some other process (e.g. CUDA application, graphics application like X server, monitoring application like other instance of nvidia-smi). Please first kill all processes using this GPU and all compute applications running in the system (even when they are running on other GPUs) and then try to reset the GPU again.
Terminating early due to previous errors.

Any other ideas?

I’d rather avoid resetting the server, as other processes are running on it.

Thanks for your help,
Franck

Robert_Crovella · August 18, 2016, 8:53pm

log out of the username that issued the interrupted work to that gpu
as root, find all running processes associated with the username that issued the interrupted work on that gpu:

ps -ef|grep username

as root, kill all of those
as root, retry the nvidia-smi gpu reset

If that doesn’t work, I’m out of ideas.

monoid · August 19, 2016, 11:16am

Apart from nvidia-smi, on Linux you can check which processes might be using the GPU using the command

sudo fuser -v /dev/nvidia*

(this will list processes that have NVIDIA GPU device nodes open).

Franck_Dernoncourt · August 19, 2016, 2:48pm

@monoid Thanks, unfortunately it didn’t list any unwanted process.

Franck_Dernoncourt · August 20, 2016, 4:35am

@txbob Thanks, I’ll keep it as a last resort as they are a few of the processes being run by the same user. I do end up using it, I’ll let you know how it goes.

nicklhy · February 13, 2017, 1:46am

Any updates here? I just met the same problem recently.
Is it possible to reset the gpu device without system reboot?

Franck_Dernoncourt · October 14, 2017, 12:06am

@nicklhy Sorry, I don’t have any more information on my side. Did txbob’s suggestion work for you? I could not try it as I had to keep alive some processes, then one day the server rebooted as a result of a power outage. I haven’t had the issue since then.

farhana · December 7, 2017, 3:13am

I was facing the same problem while working with python. In my case, I simply killed python process from the system monitor and it worked.

msalihkaragoz44 · July 31, 2018, 9:51am

In my case,

I killed all process belonging to the user.

pkill -u [username]

lefnire · January 4, 2019, 7:36pm

I use nvtop GitHub - Syllo/nvtop: AMD and NVIDIA GPUs htop like monitoring tool for monitoring GPU anyway (useful program). It lists processes like htop, but only those using GPU. You can kill them directly from its console. This helped me because nvidia-smi -r gives me GPU Reset couldn't run because GPU 00000000:01:00.0 is the primary GPU.

aviramb · July 1, 2019, 11:47pm

If you’re OK with killing all python processes (set /dev/nvidia# with the GPU number):

for i in $(sudo lsof /dev/nvidia0 | grep python  | awk '{print $2}' | sort -u); do kill -9 $i; done

370095872 · July 18, 2019, 2:03pm

Please refer to this: restart - Can I stop all processes using CUDA in Linux without rebooting? - Stack Overflow

monzymerza · March 23, 2020, 5:57am

killing the python process worked for me.
i am using a Jupyter notebook. in subsequent runs, i shutdown the notebook kernel by going to Kernel->Shutdown in the notebook. this releases the memory. used, watch nvidia-smi, to track GPU memory usage.

TeaLev · December 15, 2020, 11:12am

To kill all processes that is allocating an nvidia card

fuser -k /dev/nvidia[GPUID]

dnb1654rrts · April 7, 2023, 3:09am

This worked for me

daohu527 · September 22, 2023, 7:48am

The following command can show the processes using the GPU, but nvidia-smi can’t

sudo fuser -v /dev/nvidia*

Then kill the process

kill -9 process_id

This is work for me!!!

Topic		Replies	Views
10 GB of GPU RAM used, and no process listed by nvidia-smi CUDA Setup and Installation cuda , nvbugs , pytorch	1	3208	June 15, 2023
How to kill unknown process that eating up the GPU memory? CUDA Programming and Performance cuda , kernel	2	6442	February 1, 2023
Provide solution for "GPU MEM used by PID but no GPU LOAD" DGX User Forum monitoring	2	1938	January 10, 2023
per-process resource accounting CUDA Programming and Performance	2	2750	December 22, 2022
How to remove /usr/bin/gnome-shell in nvidia-smi? CUDA Setup and Installation ubuntu	3	4641	January 2, 2025
GPU Memory Usage shows "N/A" CUDA Setup and Installation	15	36943	May 22, 2024
nvidia-smi Volatile GPU-Util 100%, always, reboot operating system can not fix CUDA Setup and Installation	6	11294	November 30, 2020
After installing CUDA 9.0 in POWER9(RHEL7), nvidia-smi shows Unknown Error in Memory_Usage column. CUDA Setup and Installation	18	3139	June 8, 2018
No Process in GPU but GPU memory-usage is full; CUDA Setup and Installation	1	5082	March 28, 2021
GPU Bar1 Memory Usage Couldnt Release After Running Pytorch Code CUDA Programming and Performance cuda , pytorch	8	868	November 16, 2023

11 GB of GPU RAM used, and no process listed by nvidia-smi

Related topics