I had launched a Theano Python script with a lib.cnmem=0.9 flag, which explains why it used 11341MiB of GPU memory (the CNMeM library is a “simple library to help the Deep Learning frameworks manage CUDA memory.”.). However, I killed the script, and was expecting the GPU memory to get released. pkill -9 python did not help.
I use a GeForce GTX Titan Maxwell with Ubuntu 14.04.4 LTS x64.
Unable to reset this GPU because it’s being used by some other process (e.g. CUDA application, graphics application like X server, monitoring application like other instance of nvidia-smi). Please first kill all processes using this GPU and all compute applications running in the system (even when they are running on other GPUs) and then try to reset the GPU again.
Terminating early due to previous errors.
Any other ideas?
I’d rather avoid resetting the server, as other processes are running on it.
@txbob Thanks, I’ll keep it as a last resort as they are a few of the processes being run by the same user. I do end up using it, I’ll let you know how it goes.
@nicklhy Sorry, I don’t have any more information on my side. Did txbob’s suggestion work for you? I could not try it as I had to keep alive some processes, then one day the server rebooted as a result of a power outage. I haven’t had the issue since then.
I use nvtop GitHub - Syllo/nvtop: AMD and NVIDIA GPUs htop like monitoring tool for monitoring GPU anyway (useful program). It lists processes like htop, but only those using GPU. You can kill them directly from its console. This helped me because nvidia-smi -r gives me GPU Reset couldn't run because GPU 00000000:01:00.0 is the primary GPU.
killing the python process worked for me.
i am using a Jupyter notebook. in subsequent runs, i shutdown the notebook kernel by going to Kernel->Shutdown in the notebook. this releases the memory. used, watch nvidia-smi, to track GPU memory usage.