I’ve seen reference to the lifespan of CUDA allocated memory as being the lifespan of the creating app. Is this true for "deviceconstant " memory as well as cudaMalloc() allocated memory?
If so, that implies my CUDA application is potentially hogging GPU resources if there are long gaps (multiple minutes and out of my control) between runs against the device. I guess cudaFree() can be used for the malloc’d memory. However, how can I release the deviceconstant memory as well as other unused GPU resources in such a manner that I can continue the app running its non-GPU activities? Is this the intent of cudaThreadExit(), and can I do something like interleave calls to cudaSetDevice() at the beginning and cudaThreadExit() at the end of each period of heavy use?
Lest you all think me too altruistic, other copies of my application may be the ones attempting to get at those hogged GPU resources.
device and constant memory declarations are registered with the CUDA runtime via calls to __cudaRegisterVar __cudaRegisterTexture __cudaRegisterShared and __cudaRegisterSharedVar . These are bound to a specific fat binary object which is registered with __cudaRegisterFatBinary . There is no way to unregister a gloabl variable. However, you can unregister a fat binary and all resources associated with __cudaUnregisterFatBinary. I would assume that this would deallocate all global variables. Unfortunately, you will have to explicitly register everything again if you want to run another kernel.
You probably don’t want to get that low level, and your cudaThreadExit() approach might work, but the API documentation is unclear. An easy way to try would be to have a kernel write something to a gloabl variable, call cudaThreadExit(), then call another kernel and see if the values are still there.
If I were you, I would just allocate everything large enough to matter out using cudaMalloc so that you can manage it yourself.