Also note that it is certainly impossible to actually know how much memory is currently available for real, except if you have a perfectly dedicated machine (where you would have only a few processes that you totally control). So you can perhaps get the max amount of RAM available, but you certainly can’t get the amount of RAM that you should be able to alloc. Of course, in case you have a dedicated machine, and if you limit yourself to, say, 75% of the memory, it becomes a reasonnable approximation.
Valgrind works with CUDA. In fact, valgrind with device emulation has, until very recently, been the recommended way of debugging out of bounds memory errors in CUDA.
Just be a piece of code “works fine”, doesn’t mean it doesn’t contain memory management errors.
OK, now I am a little confused. I assumed, based on your original post, that you want to instrument your host memory usage because you have memory leaks or other problems with memory management. That is precisely what valgrind is designed for. Perhaps I misunderstood something, but what is it you actually want to achieve?
You are right, I should have run with emulation code first. I just did with a test code and it works fine in emulation but with the release code, it spits out a lot of errors which are not there.
And yes, I want to get the host part to behave. My checkCpuMem() works well to pinpoints the problem. I am hoping that valgrind can help to do the rest.