CUDA host memory cleanup

I’m working on a project that is sort of a launcher for applications, most of which use CUDA for their computing. When investigating the memory footprint of the whole system, I found out that after terminating an application, a lot of memory (I’m talking about host memory) was never freed, after being allocated during the application launch. Seemingly, the first call to the CUDA runtime (like calling cudaMalloc the first time) allocates a lot of host memory (~400MB in my case), and I’m looking for a way to make this chunk available to the system. I tried a few things, using cudaDeviceReset and cudaThreadExit at the termination, or creating my own context using the driver API, and destroying it on termination, but none of these solutions bring the memory usage to the state preceding the application launch. I’ve ran multiple tools for leak checking and memory profiling, these tools indicate that a lot of the memory that stays alive during the execution comes from cudbgApiDetach and cuVDPAUCtxCreate, that are called by cudaMalloc (I suppose the first of them), so I’d like to hear if someone had a similar problem, and if there’s a proper way to cleanup the memory used by the runtime and driver APIs.

Hey tuccio,

I have been having the same experience when using googles heap profiler (tcmalloc). Did you get leaks with cudaFreeHost as well?

I used to get these leaks with cudaFree but I fixed it by setting the device just before I freed it. If I didn’t set the device it looked like CUDA didn’t think there was a context so created a new one and then tried to free the memory inside that context which is of course incorrect.

Hope this maybe helps, and if you have any more information about this, please let me know :)