Cuda samples fail to allocate memory after running a few pieces of code

greenberet123 · August 25, 2016, 9:30pm

Hi,

Im running Ubuntu 16.04 and coda 7.5 and nvidia drivers 361 for my 2 Tesla k40c GPUs.

I was able to run the cuda samples (specifically vectorAdd).

I ran a few cud programs using cutorch and now when I try to run vectorAdd, it says

$ sudo ./vectorAdd
[Vector addition of 50000 elements]
Failed to allocate device vector A (error code out of memory)!

If i restart the machines, things work again, and then stop after a few runs of my code. This was also happening earlier with ubuntu 14.

The debug log is here: http://sprunge.us/hhaM

Thanks in advance!

Robert_Crovella · August 25, 2016, 9:47pm

a process of yours (presumably in your cutorch workflow) is terminating in a bad fashion and not freeing memory

normal process termination should release any allocations.

You could try using the reset facility in nvidia-smi to try to reset the GPUs in question. If that is possible, it should fix the issue without a reboot. You could also try to identify any processes associated with the GPU in question using nvidia-smi and kill those processes manually.

otherwise you’ll need to identify your process termination issues and rectify them, or else reboot the system.

mfatica · August 25, 2016, 10:38pm

There was a bug in certain drivers where the memory was not released if the process was terminated.
Try to use the latest 361 driver, I don’t remember in which version was fixed.

njuffa · August 26, 2016, 3:36am

I assume you meant “There was a bug in certain drivers where the memory was not released if the process was terminated abnormally” ?

greenberet123 · August 28, 2016, 7:10pm

Thanks for the reply guys. Still no luck.

I successfully reset both GPUs in my machine using nvidia-smi
According the nvidia-smi, there are no processes running that are using the gpu.

I am using NVIDIA-SMI 361.42 … which I installed just a few days ago.

I cannot reboot the machine since many others are logged in.

I tried ‘rmmod’ followed by ‘modprobe’ of the nvidia driver. Even that didn’t fix it.

Is there something else I can do to refresh everything and emulate the effect of rebooting? Thanks!

greenberet123 · September 6, 2016, 10:47pm

bump

Topic		Replies	Views
Problem with cuda 7 toolkit Samples CUDA Setup and Installation	5	5689	June 30, 2015
installing cuda 10.2 on ubuntu 18.04.3 CUDA Setup and Installation	1	11752	November 21, 2019
CUDA 10.1 samples not working on VS2017 CUDA Setup and Installation	6	2218	April 11, 2019
Example MS VStudio code fails to run CUDA Setup and Installation	5	591	November 10, 2022
nvprof cannot profile simple kernel from NVIDIA CUDA Samples Visual Profiler and nvprof	0	1528	December 18, 2019
Buying Nvidia Products is a Serious Waste of Money: They Don't Work CUDA Developer Tools	0	439	June 26, 2020
nan in simple vector addition CUDA Programming and Performance	7	2410	December 13, 2012
Cuda allocate device memory failed CUDA Programming and Performance	0	1335	January 31, 2019
installation problem in NVS 315 CUDA Setup and Installation	2	1832	May 4, 2015
Strange access to memory CUDA Programming and Performance	4	583	March 17, 2014

Cuda samples fail to allocate memory after running a few pieces of code

Related topics