cudaDeviceReset memory leak?

Hey all, in my program I am currently using cudaDeviceReset as a way to free all global memory I’ve allocated, however it seems like there is a memory leak associated with it.

Running the following code ends up with about 50-60 MB more allocated then when it starts.

for (int i = 0; i < 1000; i++)





Both functions always return cudaSuccess. I have noticed that if I do not execute the cudaDeviceSynchronize no memory is allocated and this block takes almost no execution time, I assume that cudaDeviceReset checks if there is any need to reset first and if nothing has happened with the device it just returns.

This is with CUDA 4.0 on Windows, with a GTX 550 Ti card.

Is this a bug or do I need to do something else to clean up I am not doing? Or a different way to clear global memory?


I get a similar memory leak in the following case:

for (int i = 0; i < 2500; i++) {



cudaThreadExit(); // or cudaDeviceReset();


I’m using CUDA 4.0 with a GeForce GTX 480 graphics card.