Memory problem

I am wondering why I can’t use more than 120MB VRAM on a GT330 with 256MB VRAM.

I tried also the deviceQuery program from the SDK examples, which also states that

there are 256MB global memory available.

I used the following program to test how much memory I can upload to the GPU:

[codebox]#include <cutil_inline.h>

#include <cutil_math.h>

global void testKernel(float *data)

{

}

#define DATA_MB 130u

#define DATA_SIZE ((DATA_MB * 1024u * 1024u) / sizeof(float))

int main()

{

printf("Transfering %.2fMB data to VRAM\n", (DATA_SIZE * sizeof(float)) / (1024.0 * 1024.0));

float *h_data = new float[DATA_SIZE];

for (unsigned int i=0; i < DATA_SIZE; ++i)

	h_data[i] = drand48();

float *d_data;

cutilSafeCall(cudaMalloc((void **)&d_data, DATA_SIZE * sizeof(float)));

cutilSafeCall(cudaMemcpy(d_data, h_data, DATA_SIZE * sizeof(float), cudaMemcpyHostToDevice));

printf("Invoking kernel\n");

dim3 blckDim = dim3(16, 16, 1);

dim3 gridDim = dim3(4, 4, 1);

testKernel<<<gridDim, blckDim>>>(d_data);

printf("Cleanup\n");

cutilSafeCall(cudaFree(d_data));

delete[] h_data;

printf("Checking error");

cudaError_t err = cudaGetLastError();

if(cudaSuccess != err)

{

	fprintf(stderr, "Cuda error: %s\n", cudaGetErrorString(err));

	return -1;

}

return 0;

}[/codebox]

By trying to use more than 120MB, the program always crashes with a ‘Out of memory’ exception.

I guess, there is nothing wrong with the code snippet above? Can anyone clear up the situation?

Thanks in advance.

Check the amount of free memory by using the function cudaMemGetInfo (size_t free, size_t total). total should come out equal to 256MB in bytes, while free is the amount of available memory in bytes. If you are using Windows 7 or Vista with Aero enabled, then 120MB of free memory sounds about right. Disabling Aero frees up quite a bit of memory.

Thanks for the answer. I tried the command which gives the following:

Free: 167MB

Total: 255MB

When I try to upload 140MB, then it always ends in an ‘Out of memory’. Really weird. Any further ideas?

By the way, I use OSX.

It might be because there is 167Mb free memory, but the memory is fragmented so that you cannot allocate it all in one chunk. Try allocating in chunks of 10Mb (without allocating in between) and see if that makes any difference.

That make sense and helped a lot. Thanks :)