I’m writing a code which utilizes CUDA for computational speedup and there’s one thing that’s bugging me. The problem is that the total amount of device memory that is used seems to be more than what I’m allocating. Is it possible that the kernels (I have many kernels) are consuming device memory?
The reason I’m concerned is that it once the application is started, an additional of 430MB of device memory is consumed.
I’m using Ubuntu 9.04 (64-bit) with latest CUDA drivers and toolkit. I also use CUDPP library and adapted the reduction SDK sample into my code.
edit: I removed all kernels from the compilation and the only thing left is a cudaMalloc call where I allocate 256MB. The total memory consumption is still high, 390MB.