I am writing a library that uses CUBLAS to fit statistical models. Currently, I have a routine that calls cublasCreate() and cublasDestroy() for working with CUBLAS. In some cases, this routine needs to be called repeatedly with varying inputs for, as an example, a simulation study.
What I’ve noticed, even when I simply use cublasCreate/Destroy(), the GPU has a small <1MB amount that is never deleted. In fact, unless I call cudaDeviceReset(), there is ~40MB allocated and never freed during each call to the cublasCreate/Destroy() pair.
I assume that this probably isn’t a memory leak, but instead I am not using CUBLAS as intended. With that said, does it make sense to call cublasCreate/Destroy() as I am doing, or should I be doing something different with my single execution thread?
Thanks for any help!