How much GPU memory can cudaMalloc get?

Hello everyone,

I tried to get 3GB memory by cudaMalloc.

OS : Windows7 64bit

GPU(display) : QuadroFX3800

GPU(GPGPU) : TESLA C2070

CUDA : 3.2 RC

Driver : 261.00

Compiler : Visual Studio 2008

cudaSetDevice( 0 ); // C2070

cudaError_t cu_err;

void* ptr = NULL;

size_t sz = (size_t)3 * 1024 * 1024 * 1024;

cu_err = cudaMalloc( (void**)&ptr, sz );

if( cu_err != cudaSuccess ){

	printf("%s", cudaGetErrorString( cu_err ));

	return;

}

But this code couldn’t get 3GB memory, though C2070 has 6GB memory. So I’d like to know what is the problem and if there is a maximum allocation size.

By the way, I could allocate 2GB memory by the code.

Best regards,

Hello everyone,

I tried to get 3GB memory by cudaMalloc.

OS : Windows7 64bit

GPU(display) : QuadroFX3800

GPU(GPGPU) : TESLA C2070

CUDA : 3.2 RC

Driver : 261.00

Compiler : Visual Studio 2008

cudaSetDevice( 0 ); // C2070

cudaError_t cu_err;

void* ptr = NULL;

size_t sz = (size_t)3 * 1024 * 1024 * 1024;

cu_err = cudaMalloc( (void**)&ptr, sz );

if( cu_err != cudaSuccess ){

	printf("%s", cudaGetErrorString( cu_err ));

	return;

}

But this code couldn’t get 3GB memory, though C2070 has 6GB memory. So I’d like to know what is the problem and if there is a maximum allocation size.

By the way, I could allocate 2GB memory by the code.

Best regards,

While I am no expert in working with CUDA on Windows, that sounds like it might well be a WDM limitation. Windows Vista and later have their own GPU memory manager which imposes some additional limitations on how much memory a process can grab in single memory allocation call. There is a dedicated compute driver for Tesla on WDM versions of Windows which might let you bypass these limits, but I stress that is just a guess as to what might be going on.

While I am no expert in working with CUDA on Windows, that sounds like it might well be a WDM limitation. Windows Vista and later have their own GPU memory manager which imposes some additional limitations on how much memory a process can grab in single memory allocation call. There is a dedicated compute driver for Tesla on WDM versions of Windows which might let you bypass these limits, but I stress that is just a guess as to what might be going on.

I remember there was a limitation… One of the releases, I think they fixed (or) limited the limitation. You may want to check the the latest CUDA 3.2 release notes and compare with 3.1 release notes.

I remember there was a limitation… One of the releases, I think they fixed (or) limited the limitation. You may want to check the the latest CUDA 3.2 release notes and compare with 3.1 release notes.

I saw this sentence in the release note.

I guess this is the answer for my question.

Thank you,

I saw this sentence in the release note.

I guess this is the answer for my question.

Thank you,

Switch to TCC on the C2070 and you’ll be able to allocate quite a bit of memory.

Switch to TCC on the C2070 and you’ll be able to allocate quite a bit of memory.