I run this computation where in the end, it dumps out a file of N doubles. The file is maybe a few MB in size. However, during my calculation, I seem to use hundreds and even into the 1-2GB range of memory while running the computation. There are other arrays I hold on the device that are also N in size, but no where near enough to explain how I am going up into the 1-2GB range.
I would like to calculate how much memory I will need (versus how much is available onboard the GPU) before actually launching the GPU kernels so I can avoid crashing my display driver. How can I do this?
What is the nature of these computations? If the computation is performed by kernels in your code, the GPU memory used by them would be allocated by your application, and you would know exactly how much memory gets allocated where.
If you are using libraries like CUFFT where some library API functions allocate internal scratch pad memory, check the library documentation for guidance on how much additional memory is used. I am reasonably sure the CUFFT documentation provides some guidance on this, for example.
It is my own code. But I found the solution. The solution is I apparently did not have enough coffee this morning as I completely forgot I turned this 2D problem into 3D recently. The text file dumps are just slices of the 3D problem, so their output sizes individually are no different than the 2D problems.
Initially I had N = NxNy1, and I switched it to NxNyNz to make it 3D. Now the numbers are making sense.