I’ve got a kernel running in a separate stream created with cudaStreamCreate. I want to free some device memory up in the middle of that kernel’s execution using cudaFree or cudaFreeHost. My runs so far indicate that it is not possible to call cudaFree while a kernel is executing, and that it will hang until that kernel completes. Is this accurate? Is there any way around this?
As long as your kernel still accessing some global memory, you obviously cannot free the memory underneath.
So I do not really understand your user case. How can you be sure that the kernel will not access some pieces of memory after a certain point if you do not have some kind of synchronization mechanism between Host and GPU
Anyway, the way cudaFree is implemented now, you cannot do anything like this because there is an implicit cudaThreadSynchronise in it.
To be able to manage more freely the GPU memory, you could create your own Mempool ( using a big cudaMalloc at the beginning) . Further allocation will tap memory from your Mempool. This way,you can implement some asynchronous Mempool free function using cudaEvent_t and stream.