Maximum Number of cudaMalloc() Calls

Ken_Chaffin · September 7, 2009, 8:17pm

Does anyone know if there is a maximum number of global objects that can be allocated via a series of cudaMalloc() calls? From everything I can determine via experimentation, my Quadro 3700 (512MB) gives “out of memory” errors after 72,407 calls, regardless of the allocation size. cuMemInfo() shows that the global memory has in fact been consumed after the cudaMalloc() call failure. My C1060 (4GB) board behaves the same but is able to allocate 129,814 objects, but after the first cudaMalloc failure, I see that I still have around 12MB of global memory free via cuMemInfo.

I thought that perhaps cudaMalloc has a very high alignment value, but the Quadro board would have to allocate on at least a 4K boundary and the C1060 would have to align on about 32K boundary to consume the observed maximum number of allocations. Those seem very large if that is what is happening.

Alternatively, perhaps there is a CUDA global memory manager that only has a finite number of slots for allocated memory?

I’ve searched and read many threads here trying to find an answer to this, to no avail.

By the way, I realize this type of uncoelesced memory structure will be result in very poor kernel performance, but I’m trying to port a very complex existing program to CUDA and sort of need to get it working with existing data structure and then work on performance.

Thank you,
Ken Chaffin

avidday · September 7, 2009, 8:25pm

From what I can see, it looks like (at least for CUDA 2.3 on a 1.1 capability device) that the allocation page size is either 4kb or 64kb beyond an initial 16Mb of pre-allocated memory per context. I posted a little test program you can use to see how it works here

Ken_Chaffin · September 7, 2009, 10:14pm

Thanks for the reply and info. This seems consistent with what I’m seeing, although the different page sizes upon different cudaMalloc() calls was confusing me.

Looks like I’ll need to write my own memory manager which divvies up large chunks into smaller bites.

Ken

Topic		Replies	Views
malloc can't allocate more than 8Mb from the __device__ function, 6Gb available. CUDA Programming and Performance	4	1564	February 13, 2015
Accurately determining available global memory on a CUDA device CUDA Programming and Performance	2	14397	April 11, 2011
cuMemAlloc questions CUDA Programming and Performance	1	2409	January 29, 2010
Maximum memory allocation size CUDA Programming and Performance	7	16620	January 24, 2012
Maximal allocatable memory block 1.7 GB is the limit? CUDA Programming and Performance	4	9776	November 18, 2009
Max cudaMalloc under Windows 7 is there a restriction? CUDA Programming and Performance	1	1786	November 4, 2009
cudaMalloc failed with unknown error after only 491656bytes CUDA Programming and Performance	9	4383	July 2, 2009
Memory allocation : strange behavior CUDA Programming and Performance	4	2554	March 4, 2008
cudaMalloc fails after CUDA Programming and Performance	6	7696	June 18, 2012
cuMemAlloc limited to 1/4 total GPU memory? CUDA Programming and Performance	10	12774	April 1, 2010

Maximum Number of cudaMalloc() Calls

Related topics