Initialization/termination for linking multiple runtime DLLs


I’m trying to find out more about the initialization/termination of CUDA programs, if they’re in dll’s. In the programming guide (for the runtime API, not driver API), says that initialization is handled implicitly, but mentions nothing about freeing of resources.

One example I have particularly in mind: Two separate dlls that have their own cuda code, and can launch their own kernels, etc. Then in the cuda files for each, they both have a global static array definition. So, a host program loads the first dll, runs its functions, releases the dll, then loads the next to use. It seems that if the static declaration in the first was large enough, and not freed correctly, there could be resource allocation issues in the second dll. I don’t know how the first library knows that it’s freed, however, as there are no associated function calls that I’m aware of.

Does anyone have any experience or knowledge for things like this? Note that I’m still unfamiliar with DLLs/cuda in DLLs in general, so some of what I’ve said above may not make any sense in terms of what actually happens, in which case, please correct me.