cuSolver handle GPU memory use

When I run your code as posted on CUDA 11.6 or CUDA 11.8, I get a report of 81MB, not 390MB.

CUDA uses lazy initialization, so you may not have CUDA fully initialized at the first call to cudaGetMemInfo. Then when you make the 2nd call, there will be some CUDA overhead.

However I don’t know anywhere that it is claimed that doing a handle destroy will release all library overhead. So I’m pretty confident this is not a bug.

For example, when CUDA loads a library like cusolver, it loads all the kernels in the cusolver library. Destroying a handle doesn’t unload all these kernels.

If you’d like to see a change in CUDA behavior, you can always file a bug, and also you may want to investigate CUDA opt-in (for CUDA 11.7 and 11.8) “lazy” module loading. This will likely reduce the memory footprint.

compile with the following env var set:

CUDA_MODULE_LOADING=LAZY

using CUDA 11.7 or 11.8. However, as I reported, when I test the code you have posted here, I get 81MB, not 390MB, and this switch has no effect on that observation.