How much GPU memory can cudaMalloc get?

Tooney · November 17, 2010, 9:39am

Hello everyone,

I tried to get 3GB memory by cudaMalloc.

OS : Windows7 64bit

GPU(display) : QuadroFX3800

GPU(GPGPU) : TESLA C2070

CUDA : 3.2 RC

Driver : 261.00

Compiler : Visual Studio 2008

cudaSetDevice( 0 ); // C2070

cudaError_t cu_err;

void* ptr = NULL;

size_t sz = (size_t)3 * 1024 * 1024 * 1024;

cu_err = cudaMalloc( (void**)&ptr, sz );

if( cu_err != cudaSuccess ){

	printf("%s", cudaGetErrorString( cu_err ));

	return;

}

But this code couldn’t get 3GB memory, though C2070 has 6GB memory. So I’d like to know what is the problem and if there is a maximum allocation size.

By the way, I could allocate 2GB memory by the code.

Best regards,

Tooney · November 17, 2010, 9:39am

Hello everyone,

I tried to get 3GB memory by cudaMalloc.

OS : Windows7 64bit

GPU(display) : QuadroFX3800

GPU(GPGPU) : TESLA C2070

CUDA : 3.2 RC

Driver : 261.00

Compiler : Visual Studio 2008

cudaSetDevice( 0 ); // C2070

cudaError_t cu_err;

void* ptr = NULL;

size_t sz = (size_t)3 * 1024 * 1024 * 1024;

cu_err = cudaMalloc( (void**)&ptr, sz );

if( cu_err != cudaSuccess ){

	printf("%s", cudaGetErrorString( cu_err ));

	return;

}

But this code couldn’t get 3GB memory, though C2070 has 6GB memory. So I’d like to know what is the problem and if there is a maximum allocation size.

By the way, I could allocate 2GB memory by the code.

Best regards,

avidday · November 17, 2010, 10:03am

While I am no expert in working with CUDA on Windows, that sounds like it might well be a WDM limitation. Windows Vista and later have their own GPU memory manager which imposes some additional limitations on how much memory a process can grab in single memory allocation call. There is a dedicated compute driver for Tesla on WDM versions of Windows which might let you bypass these limits, but I stress that is just a guess as to what might be going on.

avidday · November 17, 2010, 10:03am

While I am no expert in working with CUDA on Windows, that sounds like it might well be a WDM limitation. Windows Vista and later have their own GPU memory manager which imposes some additional limitations on how much memory a process can grab in single memory allocation call. There is a dedicated compute driver for Tesla on WDM versions of Windows which might let you bypass these limits, but I stress that is just a guess as to what might be going on.

Sarnath · November 18, 2010, 6:47am

I remember there was a limitation… One of the releases, I think they fixed (or) limited the limitation. You may want to check the the latest CUDA 3.2 release notes and compare with 3.1 release notes.

Sarnath · November 18, 2010, 6:47am

I remember there was a limitation… One of the releases, I think they fixed (or) limited the limitation. You may want to check the the latest CUDA 3.2 release notes and compare with 3.1 release notes.

Tooney · November 22, 2010, 12:45am

I saw this sentence in the release note.

I guess this is the answer for my question.

Thank you,

Tooney · November 22, 2010, 12:45am

I saw this sentence in the release note.

I guess this is the answer for my question.

Thank you,

tmurray · November 22, 2010, 1:36am

Switch to TCC on the C2070 and you’ll be able to allocate quite a bit of memory.

tmurray · November 22, 2010, 1:36am

Switch to TCC on the C2070 and you’ll be able to allocate quite a bit of memory.

anhuimvp · March 28, 2022, 9:05am

How much GPU memory can cudaMalloc get at the maximum ? if GPU with 8GB GDDR6 memory on the hardware card is running under TCC mode on Linux? thanks a lot.

Robert_Crovella · March 28, 2022, 2:07pm

You’ll have to discover this experimentally. There are no published data and no formulas that can be used.

njuffa · March 29, 2022, 12:28am

Note that TCC applies to Windows systems only, it does not apply to Linux systems. As @Robert_Crovella says, the maximum size of a single allocation for a particular system configuration cannot be established a priori. But here are some experimentally determined numbers to give you a rough idea. This is from a system running Windows 10 Professional with 32 GB of system memory, CUDA 11.x, idling with only the desktop running.

Quadro RTX 4000, WDDM driver: 7.25 GB (7.78e9 bytes) out of 8 GB provided by the hardware. About 90%.
Quadro P2000, TCC driver: 4.85 GB (5.20e9 bytes) out of 5 GB provided by the hardware. About 97%.

When using the WDDM driver, GPU memory allocations are serviced by the Windows operating system’s memory allocator. When using the TCC driver (not possible with all GPUs!), the driver provides its own allocation mechanism, i.e. the operating system’s mechanism is bypassed. For reasons unknown to me, the maximum size of GPU memory allocations when using the WDDM driver always seems to be significantly smaller than when using the TCC driver.

For Linux, I would expect the maximum size of a GPU memory allocation to be more in line with the TCC driver scenario, so maybe 95% of the memory provided by the hardware. This is a guesstimate, not a guarantee. Factors such as GPU memory usage by other tasks (including a GUI), total amount of system memory, and internal fragmentation in the allocator, could all play a role in what is available to a CUDA application. You would want to write your software such that it functions with any amount of memory and exits cleanly in the worst case (unable to run with the amount of memory found).

anhuimvp · April 2, 2022, 2:25am

njiuffa , thanks.
in fact , before our main GPU app run, we need to run(on SMs) a short video-memory-test application(utilizing the global memory of high memory bandwidth feature) to guarantee all memory cells on-board GDDR chips are good. in this short stage , we can use a simple and clean environment , such as , no usage of other task (including a GUI), no usage of GPU local memory, and no internal fragmentation in the allocator. and the goal what we want to achieve is that our this small video-memory-test app can test entire memory space which provided by all on-board GDDR chips. so we could try several times to ‘cudaMalloc’ to get several segments , but only one goal : all the segments we get by ‘cudeMalloc’ can cover entire GDDR chips’ memory space.
can our video-memory-test achieve this goal ?

Robert_Crovella · April 2, 2022, 2:41am

No this is not possible. CUDA reserves some device memory for its own use.

njuffa · April 2, 2022, 2:41am

I don’t have any ideas of how to do that through CUDA. Just like I don’t now how to do this for the system memory of my computer through C++'s malloc. Once an OS takes control of memory, there is usually no chance for a user application to get access to the entire physical memory. This certainly applies to GPUs using WDDM driver, where the Windows operating system (not the CUDA runtime) has complete control over memory allocations.

The typical way to side-step this limitation is to perform hardware tests prior to OS boot, and for that purpose all my systems have hardware tests accessible via the BIOS setup after a cold start. There are multiple existing GPU memory test apps out there (and have been for many years), but to my knowledge none of them can test the full physical memory.

If your use case relies on GPU memory working absolutely flawlessly, consider deploying GPUs with ECC support. With ECC single-bit errors can be fixed on the fly, while double-bit errors are detected which can be used to halt operations.

anhuimvp · April 2, 2022, 2:51am

got it!
ECC also is a suitable option for our app.
thanks for this quick reply!

anhuimvp · April 2, 2022, 2:54am

got it! your info also is a key info for us.
thanks for this quick reply!

Topic		Replies	Views
cudaMalloc3DArray out of memory can not allocate the available amount of memory CUDA Programming and Performance	3	1811	January 31, 2011
cuMemAlloc limited to 1/4 total GPU memory? CUDA Programming and Performance	10	12776	April 1, 2010
How do I increase the VRAM capacity programmatically? CUDA Programming and Performance	4	2072	October 12, 2021
cudaMalloc Incorrectly Reporting out of Memory Can't allocate more than 1151 MB at a time. CUDA Programming and Performance	6	1733	February 1, 2011
Maximal allocatable memory block 1.7 GB is the limit? CUDA Programming and Performance	4	9777	November 18, 2009
Cudamalloc attempting to allocate more memory than it is supposed to CUDA Programming and Performance cuda	15	151	January 13, 2025
Cannot allocate "all" memory? cudaMalloc fails with 50MB memory left.. CUDA Programming and Performance	9	9601	July 15, 2008
Question about cudaMalloc Behavior When Exceeding Physical VRAM on GTX 1070 CUDA Programming and Performance	1	38	December 26, 2024
bypass wddm restrictions how to allocate memory for a huge 3d texture CUDA Programming and Performance	5	14347	February 1, 2011
Max cudaMalloc under Windows 7 is there a restriction? CUDA Programming and Performance	1	1786	November 4, 2009

How much GPU memory can cudaMalloc get?

Related topics