cudaMalloc failing cuda malloc failing

SteveInAustinToo · August 8, 2011, 3:12pm

We are using an HP SL390 board with 8 GPUs:

Device 0: “Tesla M2050”
CUDA Driver Version / Runtime Version 3.20 / 3.20
CUDA Capability Major/Minor version number: 2.0
Total amount of global memory: 2687 MBytes (2817982464 bytes)

(Driver version 260.19.12)

After some large amount of testing without a reboot, we were unable to allocate memory on the device. While deviceQuery showed 2.6 GB available, cudaMalloc would always fail on device 0. The other devices were fine.

I’ve seen similar problems posted here, mostly on the Windows side for some time, but never saw an adequate resolution. Getting this sort of failure requiring a reboot is a real problem in our environment. It would seem to be a memory leak, perhaps in the driver. Has anyone else run into this? Is there a known fix?

Thanks,

Steve
8/8/2011

Topic		Replies	Views
cudaMalloc fail on 32MB memory CUDA Programming and Performance	2	1072	July 29, 2009
cudaMalloc fails when using cudaGLSetGLDevice CUDA Programming and Performance	1	1246	August 5, 2011
cudaMalloc Limit CUDA Programming and Performance	2	2759	July 17, 2008
cudaMalloc difference between Tesla Device and Geforce Device? cudaMalloc on complete global memory CUDA Programming and Performance	6	8688	June 1, 2011
cudaMalloc problem on a Titan card CUDA Programming and Performance	2	808	January 7, 2014
deviceQuery reports: cudaGetDeviceCount returned 10 -> invalid device ordinal / test results... F CUDA Programming and Performance	1	3568	July 2, 2013
cudaMalloc() CUDA Programming and Performance	0	838	October 9, 2013
CUDA Device Query Error CUDA Programming and Performance	1	2571	June 4, 2012
Problem with memory allocation on Device CUDA Programming and Performance	2	768	July 22, 2014
cudaMalloc fails on huge allocation CUDA Programming and Performance	4	790	March 28, 2011

cudaMalloc failing cuda malloc failing

Related topics