I am running out of memory for cudaMallocs on the device side (i.e. I am allocating in the kernel not host).
Is there a way to monitor how much free memory there is on the GPU as I step through the device function in debug mode?
I see that cudaMemGetInfo is a host only function.
I am on Windows, so I am looking at the NSight window showing memory allocations, but bizarrely it doesn’t seem to show any significant allocations at the point cudaMalloc returns 2 (out of memory). I don’t think I am fragmenting the memory badly, so I would like to check on the true GPU memory usage.
Any ideas appreciated.