losing 1.5GB on Tesla K20 cudaDeviceReset() ineffective?

wlangdon · August 26, 2015, 1:45pm

A CUDA program died horribly whilst using a K20. When using the same K20 afterwards
cudaMemGetInfo() returns a number much lower than expected (about 1611661312 smaller)
and so the correctly behaving application fails saying “Not enough memory to perform alignment”

Something similar has come recently

but I think txbob suggests cudaDeviceReset() will not cure the problem.

Has anyone experienced similar problems?
Is there a solution?

Any help or suggestions would be most welcome
Bill

Robert_Crovella · August 26, 2015, 1:54pm

You could try rebooting.

If you have root privilege, you could try unloading and reloading the nvidia driver:

sudo rmmod nvidia

(after that, any CUDA operation will reload the driver.)

If you have root privilege, you could try doing a device reset command from nvidia-smi (please use nvidia-smi --help to learn about the available commands, or refer to the man page for it.)

The last two methods probably will not work if the GPU is currently supporting a display, or has X attached to it.

wlangdon · August 26, 2015, 2:25pm

Dear txbob,
Thank you for the pointer to nvidia-smi
Using this as a diagnostic showed my reading of the problem was entrirely wrong!
nvidia-smi shows 1532MiB of GPU Memory are being used by another user!!

Many thanks
Bill

Topic		Replies	Views
Simple question about device reset CUDA Programming and Performance	1	1326	June 8, 2015
CUDA can't recognize GPUs after reboot Linux CUDA Setup and Installation	2	745	May 30, 2014
Problem with cudaGetDeviceCount returned 802 error Linux cuda	6	2672	December 28, 2024
cudaGetDeviceCount returned 100 -> no CUDA-capable device is detected CUDA Setup and Installation	0	1324	May 12, 2021
GPU breaks down after error CUDA Programming and Performance	1	769	November 3, 2010
Buying Nvidia Products is a Serious Waste of Money: They Don't Work CUDA Developer Tools	0	439	June 26, 2020
Low memory capacity for GPUs that render the display CUDA Programming and Performance	4	2218	April 23, 2012
cudaMemGetInfo returns similar result for 3 different GPUs CUDA Programming and Performance cuda , nvbugs	5	376	January 23, 2024
Device Enumeration and cudaSetDevice SDK Examples Failing to Run on Device 0, but run fine on Device CUDA Programming and Performance	5	30644	August 25, 2011
Why nvidia-smi, nor cudaMemGetInfo do not throw error with over-occupied device memory? CUDA Programming and Performance cuda	6	558	June 8, 2023

losing 1.5GB on Tesla K20 cudaDeviceReset() ineffective?

Related topics