Memory missing

my cu file only have one line
void *p = NULL:
cudaMalloc(&p, 1);
then i found my gpu memory occupy have come form 15M to 220M

i want to know why , i have 1G memory totally ,so i can only create 4 threads ,otherwise the memory will overflow.

my environment:
windows 2003 64bit , gtx 470 * 2(no sli, and doesn’t support ecc), cudasdk 3.2

thanks

Did anyone know?

Err, pretty sure this is not the answer to your problem but isn’t it supposed to be

void* p;

cudaMalloc(&p, 1);

?

The first Cuda call also initializes the context, which uses quite a bit of memory for heap, stack and printf buffer. Subsequent calls to malloc will only take up the memory that is allocated (plus some small wasted space to ensure proper alignment).

you said just as i hope, but it’s not the real.

code like this:

cudaMalloc(&p, 10010241024)

this will occupy 230 + 100M memory ,not 230M memory,

so there doesn’t have a memory pool.

The heap is there to serve future allocations from the device, not from the host. So a second call to cudaMalloc() will not be served from the heap set up while initializing the context during the first cudaMalloc(). The point I wanted to make is that the second call does not reserve another 230Mb.

If you are not using printf() and malloc() on the device and no recursion or deeply nested function calls, you can limit the size of the context by calling My cuCtxSetLimit() before the first cudaMalloc()

I have do as you said:
cuCtxCreate(&cuContext, 0 , 0);
size_t size = 0;
cuCtxSetLimit(CU_LIMIT_STACK_SIZE , 0);
cuCtxSetLimit(CU_LIMIT_PRINTF_FIFO_SIZE, 0);
cuCtxSetLimit(CU_LIMIT_MALLOC_HEAP_SIZE, 0);

cuCtxGetLimit(&size, CU_LIMIT_STACK_SIZE ); //size = 1K
cuCtxGetLimit(&size, CU_LIMIT_PRINTF_FIFO_SIZE); //size = 800k
cuCtxGetLimit(&size, CU_LIMIT_MALLOC_HEAP_SIZE); //size = 4M
cudaMalloc();
when i called the 3 limits functions , i found the memory usage decreased from 230M to 210M.

then i run the examples from the cuda sdk,
matrixMulDrv and the matrixMulDynlinkJIT,
and i found there is a strange: the call cuCtxCreate on the matrixMulDrv will occupy 200M memroy ,but the call cuCtxCreate on the matrixMulDynlinkJIT will only occupy 100M memory.
i have been every confused.

thanks every much for your help.

1 Like

Need your help!

help

did anyone know?