What is the limit of cudaMalloc?

The reference manual doesn’t say how much memory cudaMalloc can allocate for a give size gobal memory.

“Allocates size bytes of linear memory on the device and returns in devPtr a pointer to the allocated memory.
The allocated memory is suitably aligned for any kind of variable. The memory is not cleared. cudaMalloc() returns
cudaErrorMemoryAllocation in case of failure.”

The following program shows that it can only allocate about 32MB of memory on my graphics card (see spec at the bottom). Since card have over 200MB, why I can only allocate 30MB?

~/linux/test/cuda/lib/cudaMalloc$ cat main.cu
#include <stdlib.h>
#include <stdio.h>

int main(void) {
int *devPtr= NULL;
size_t size=1;

do {
cudaMalloc(&devPtr, sizesizeof(int));
if(devPtr == NULL) {
printf(“couldn’t allocate %d int’s.\n”, int(size));
return 1;
} else {
printf(“Allocated %d int’s.\n”, int(size));
}
cudaFree(devPtr);
size
=2;
} while(1);
}

~/linux/test/cuda/lib/cudaMalloc$ ./main.exe
Allocated 1 int’s.
Allocated 2 int’s.
Allocated 4 int’s.
Allocated 8 int’s.
Allocated 16 int’s.
Allocated 32 int’s.
Allocated 64 int’s.
Allocated 128 int’s.
Allocated 256 int’s.
Allocated 512 int’s.
Allocated 1024 int’s.
Allocated 2048 int’s.
Allocated 4096 int’s.
Allocated 8192 int’s.
Allocated 16384 int’s.
Allocated 32768 int’s.
Allocated 65536 int’s.
Allocated 131072 int’s.
Allocated 262144 int’s.
Allocated 524288 int’s.
Allocated 1048576 int’s.
Allocated 2097152 int’s.
Allocated 4194304 int’s.
couldn’t allocate 8388608 int’s.

NVIDIA GeForce 9400M:

Chipset Model: NVIDIA GeForce 9400M
Type: GPU
Bus: PCI
VRAM (Total): 256 MB
Vendor: NVIDIA (0x10de)
Device ID: 0x0863
Revision ID: 0x00b1
ROM Revision: 3427
Displays:
Color LCD:
Resolution: 1280 x 800
Pixel Depth: 32-Bit Color (ARGB8888)
Mirror: Off
Online: Yes
Built-In: Yes
ASUS VW266H:
Resolution: 1920 x 1200 @ 60 Hz
Pixel Depth: 32-Bit Color (ARGB8888)
Display Serial Number: A3LMTF023512
Main Display: Yes
Mirror: Off
Online: Yes
Rotation: Supported
Adapter Type: Mini DisplayPort To VGA Adapter
Adapter Firmware Version: 1.03

Hardware Overview:

Model Name: MacBook Pro
Model Identifier: MacBookPro5,5
Processor Name: Intel Core 2 Duo
Processor Speed: 2.53 GHz
Number Of Processors: 1
Total Number Of Cores: 2
L2 Cache: 3 MB
Memory: 8 GB
Bus Speed: 1.07 GHz
Boot ROM Version: MBP55.00AC.B03
SMC Version (system): 1.47f2
Sudden Motion Sensor:
State: Enabled

I think the limit is 2GB or something large like that. I have certainly used CudaMalloc for arrays larger than 32 MB.
In your case there are at least 3 things needing VRAM

  1. vRam taken up by the Mac OS and applications
  2. some vram taken up by cuda control structures
  3. the vram your test is using

Also your vram may be fragmented

Try having your loop starting from say 128 MB and going smaller CudaMalloc fails and running that right after a reboot and with minimum things running.

For my laptops the OS takes about 200 MB VRAM. Use this code to find how much memory is free before and after allocation:

size_t free, total;
printf("\n");
cudaMemGetInfo(&free,&total);   
printf("%d KB free of total %d KB\n",free/1024,total/1024);

This is how much memory it left. I have 8GB main memory. There is so little left on the GPU. Is there a way to use the main memory for GPU computation?

86048 KB free of total 259712 KB

Yes. It is called mapped memory. Check section 3.2.4.3 of the Programming Guide.

BTW. Unless it is the 13" version, your MacBook Pro also has a discrete GPU with 512 MB of dedicated memory (and roughly twice the performance). Select “Graphics: Higher performance” in the Energy Saver control panel and reboot to access it (or install gfxCardStatus to switch without rebooting).

Mine is the 13" version. So this explains why I don’t see the “Higher performance” option, right?

Yes, unfortunately.

The memory however can be fully solved using mapped memory. On your integrated GPU mapped memory technically is not different from VRAM. Both are located in main memory, only that VRAM is mapped to the GPU at boot time.