That’s right, but to be safe, you should also have a look at the figures returned by cuMemGetInfo(), which additionally provides the number of bytes actually available to CUDA. This altter value might be significanty different from what’s returned by cudaGetDeviceProperties(), especially if the card is also driving the framebuffer. On my FX4600 I loose ~140 of 768MB due to running X. Even without that, it can be several 10 MB (acc. to what I’ve experienced).
cuMemGetInfo() is a Driver API function but different from what the manual says, it CAN be used together with Runtime API functions. The only thing you have to make sure is, that a CUDA context is established by the time you call cuMemGetInfo(). I usually achieve this by cudaMalloc()'ing and immediately cudaFree()'ing a variable just before. There might be more elegant ways, but this way it works.
Besides, it might be a good idea to check the cuda cudaError_t return value of cudaMalloc(), to make sure it worked. That way you’ll immediately see whether CUDA was able to malloc the memory successfully.
But I’m glad to see that I am not the only one having trouble with this one. :)