What is the limit of cudaMalloc?

bNFwYlCUXi · December 18, 2011, 11:15pm

The reference manual doesn’t say how much memory cudaMalloc can allocate for a give size gobal memory.

“Allocates size bytes of linear memory on the device and returns in devPtr a pointer to the allocated memory.
The allocated memory is suitably aligned for any kind of variable. The memory is not cleared. cudaMalloc() returns
cudaErrorMemoryAllocation in case of failure.”

The following program shows that it can only allocate about 32MB of memory on my graphics card (see spec at the bottom). Since card have over 200MB, why I can only allocate 30MB?

~/linux/test/cuda/lib/cudaMalloc$ cat main.cu
#include <stdlib.h>
#include <stdio.h>

int main(void) {
int *devPtr= NULL;
size_t size=1;

do {
cudaMalloc(&devPtr, sizesizeof(int));
if(devPtr == NULL) {
printf(“couldn’t allocate %d int’s.\n”, int(size));
return 1;
} else {
printf(“Allocated %d int’s.\n”, int(size));
}
cudaFree(devPtr);
size=2;
} while(1);
}

~/linux/test/cuda/lib/cudaMalloc$ ./main.exe
Allocated 1 int’s.
Allocated 2 int’s.
Allocated 4 int’s.
Allocated 8 int’s.
Allocated 16 int’s.
Allocated 32 int’s.
Allocated 64 int’s.
Allocated 128 int’s.
Allocated 256 int’s.
Allocated 512 int’s.
Allocated 1024 int’s.
Allocated 2048 int’s.
Allocated 4096 int’s.
Allocated 8192 int’s.
Allocated 16384 int’s.
Allocated 32768 int’s.
Allocated 65536 int’s.
Allocated 131072 int’s.
Allocated 262144 int’s.
Allocated 524288 int’s.
Allocated 1048576 int’s.
Allocated 2097152 int’s.
Allocated 4194304 int’s.
couldn’t allocate 8388608 int’s.

NVIDIA GeForce 9400M:

Chipset Model: NVIDIA GeForce 9400M
Type: GPU
Bus: PCI
VRAM (Total): 256 MB
Vendor: NVIDIA (0x10de)
Device ID: 0x0863
Revision ID: 0x00b1
ROM Revision: 3427
Displays:
Color LCD:
Resolution: 1280 x 800
Pixel Depth: 32-Bit Color (ARGB8888)
Mirror: Off
Online: Yes
Built-In: Yes
ASUS VW266H:
Resolution: 1920 x 1200 @ 60 Hz
Pixel Depth: 32-Bit Color (ARGB8888)
Display Serial Number: A3LMTF023512
Main Display: Yes
Mirror: Off
Online: Yes
Rotation: Supported
Adapter Type: Mini DisplayPort To VGA Adapter
Adapter Firmware Version: 1.03

Hardware Overview:

Model Name: MacBook Pro
Model Identifier: MacBookPro5,5
Processor Name: Intel Core 2 Duo
Processor Speed: 2.53 GHz
Number Of Processors: 1
Total Number Of Cores: 2
L2 Cache: 3 MB
Memory: 8 GB
Bus Speed: 1.07 GHz
Boot ROM Version: MBP55.00AC.B03
SMC Version (system): 1.47f2
Sudden Motion Sensor:
State: Enabled

kbam · December 19, 2011, 12:00am

I think the limit is 2GB or something large like that. I have certainly used CudaMalloc for arrays larger than 32 MB.
In your case there are at least 3 things needing VRAM

vRam taken up by the Mac OS and applications
some vram taken up by cuda control structures
the vram your test is using

Also your vram may be fragmented

Try having your loop starting from say 128 MB and going smaller CudaMalloc fails and running that right after a reboot and with minimum things running.

pasoleatis · December 19, 2011, 8:34am

For my laptops the OS takes about 200 MB VRAM. Use this code to find how much memory is free before and after allocation:

size_t free, total;
printf("\n");
cudaMemGetInfo(&free,&total);   
printf("%d KB free of total %d KB\n",free/1024,total/1024);

bNFwYlCUXi · December 19, 2011, 2:19pm

For my laptops the OS takes about 200 MB VRAM. Use this code to find how much memory is free before and after allocation:

size_t free, total;
printf("\n");

cudaMemGetInfo(&free,&total);   

printf("%d KB free of total %d KB\n",free/1024,total/1024);

This is how much memory it left. I have 8GB main memory. There is so little left on the GPU. Is there a way to use the main memory for GPU computation?

86048 KB free of total 259712 KB

tera · December 19, 2011, 3:42pm

Yes. It is called mapped memory. Check section 3.2.4.3 of the Programming Guide.

tera · December 19, 2011, 4:03pm

BTW. Unless it is the 13" version, your MacBook Pro also has a discrete GPU with 512 MB of dedicated memory (and roughly twice the performance). Select “Graphics: Higher performance” in the Energy Saver control panel and reboot to access it (or install gfxCardStatus to switch without rebooting).

bNFwYlCUXi · December 19, 2011, 6:29pm

Mine is the 13" version. So this explains why I don’t see the “Higher performance” option, right?

tera · December 19, 2011, 7:00pm

Yes, unfortunately.

The memory however can be fully solved using mapped memory. On your integrated GPU mapped memory technically is not different from VRAM. Both are located in main memory, only that VRAM is mapped to the GPU at boot time.

Topic		Replies	Views
malloc can't allocate more than 8Mb from the __device__ function, 6Gb available. CUDA Programming and Performance	4	1653	February 13, 2015
CUDA fails to allocate large chunk of memory CUDA Programming and Performance cuda	2	1149	March 23, 2022
cudaMalloc Limit CUDA Programming and Performance	2	2825	July 17, 2008
Unable to allocate more than 2MB using malloc in CUDA kernel CUDA Programming and Performance cuda , kernel	4	1562	April 8, 2020
cuMemAlloc limited to 1/4 total GPU memory? CUDA Programming and Performance	10	12948	April 1, 2010
kernel malloc() capacity limited? can only malloc 88K blocks, more malloc() will fail CUDA Programming and Performance	2	6170	January 15, 2011
How much GPU memory can cudaMalloc get? CUDA Programming and Performance	17	15425	April 2, 2022
How do I increase the VRAM capacity programmatically? CUDA Programming and Performance	4	2362	October 12, 2021
Cannot allocate "all" memory? cudaMalloc fails with 50MB memory left.. CUDA Programming and Performance	9	9938	July 15, 2008
Amount of memory available How much memory available to cudaMalloc? CUDA Programming and Performance	1	3569	April 25, 2007

What is the limit of cudaMalloc?

Related topics