cudaMalloc fails after

The first call to cudaMalloc in the following code fails with cudaErrorMemoryAllocation because the amount of memory is too large. But the second call for only 1024 bytes fails, too.

cudaError_t err;

cudaThreadSynchronize();				// Initialize

// Determine free memory

size_t free, total;

err = cudaMemGetInfo(&free, &total);

if (err != cudaSuccess) {

	cout << "Error cudaMemGetInfo" << endl;

} else {

	cout << "Free=" << free << ", total=" << total << endl;


void* ptr;

// The first malloc fails, because the requested block is too large

err = cudaMalloc(&ptr, free);

if (err != cudaSuccess) {

	cout << "Error cudaMalloc large block" << endl;



// The second call fails, too

err = cudaMalloc(&ptr, 1024);

if (err != cudaSuccess) {

	cout << "Error cudaMalloc small block" << endl;



So after the first call the device is in an usable state. I did not expect this behaviour. Is this a bug or a feature? If it is a feature than it should be included in the documentation. I ran this on Windows 7 64 Bit, CUDA 3.2, WDDM Driver 270.81 and Visual Studio 2008.

Is there a way around this? I can’t reset the device with cudaThreadExit(), because in my application there are other buffers on the device and already allocated. And i have to use CUDA 3.2 because of my customer.

I read in another thread, that the author uses a conservative estimates of 80% of free memory. Is this the only way?

The output:

Free=1505865728, total=1576468480

Error cudaMalloc large block

Error cudaMalloc small block

Best regards,

Joern Dinkla

I think you need to call cudaGetLastError() to reset the error.

Thanks for the quick response. But this does not change the behaviour of the second call to cudaMalloc.

Best regards,


Sorry this didn’t help. As for the limit itself, there is a note in the release notes regarding Windows with WDDM drivers:

To circumvent this, you would need to run on a Tesla in TCC mode, or on Linux.

According to, “PAGING_BUFFER_SEGMENT_SIZE is approximately 2GB” and the system memory is 18 GB. So the formula yields 2GB, because MIN ( ( System Memory Size in MB - 512 MB ) / 2, PAGING_BUFFER_SEGMENT_SIZE ) = MIN ( ( 18000 - 512 MB ) / 2, 2 GB) = 2 GB.

So this is not the reason for the error described, because the card only has 1.5 GB.

For those who come here via google (like me). You need to call


This destroys all the buffers and streams allocated on the device. In my original post i wrote “in my application there are other buffers on the device and already allocated”.