CUDA kernels consuming device memory?


I’m writing a code which utilizes CUDA for computational speedup and there’s one thing that’s bugging me. The problem is that the total amount of device memory that is used seems to be more than what I’m allocating. Is it possible that the kernels (I have many kernels) are consuming device memory?

The reason I’m concerned is that it once the application is started, an additional of 430MB of device memory is consumed.

I’m using Ubuntu 9.04 (64-bit) with latest CUDA drivers and toolkit. I also use CUDPP library and adapted the reduction SDK sample into my code.

edit: I removed all kernels from the compilation and the only thing left is a cudaMalloc call where I allocate 256MB. The total memory consumption is still high, 390MB.

Could it be the CUDA context that consumes the device memory?

Kernels consume memory, contexts consume memory, local memory and stack consume memory. There are a lot of things that the driver allocates that the user doesn’t explicitly ask for but are required to guarantee CUDA semantics.

Thanks tmurray, that clarifies everything.