Maximum amount of memory you can cudamalloc?

Hi Everyone,

Just like the title says, I just wanted to know what the limit is. I have a weird bug in my program, which I suspect is due to me having encroached on the limits, and I wanted to confirm whether this is the case or not. Thanks!

Cheers,

Paul

cudaGetDeviceProperties() returns the properties of a specific GPU including the amount of memory available.

That’s right, but to be safe, you should also have a look at the figures returned by cuMemGetInfo(), which additionally provides the number of bytes actually available to CUDA. This altter value might be significanty different from what’s returned by cudaGetDeviceProperties(), especially if the card is also driving the framebuffer. On my FX4600 I loose ~140 of 768MB due to running X. Even without that, it can be several 10 MB (acc. to what I’ve experienced).

cuMemGetInfo() is a Driver API function but different from what the manual says, it CAN be used together with Runtime API functions. The only thing you have to make sure is, that a CUDA context is established by the time you call cuMemGetInfo(). I usually achieve this by cudaMalloc()'ing and immediately cudaFree()'ing a variable just before. There might be more elegant ways, but this way it works.

Besides, it might be a good idea to check the cuda cudaError_t return value of cudaMalloc(), to make sure it worked. That way you’ll immediately see whether CUDA was able to malloc the memory successfully.

But I’m glad to see that I am not the only one having trouble with this one. :)

Alex

Or use binary search to be sure you squeezed every memory bit from the card ;) Honestly. I do it this way.

There is a more direct way to establish a context using cuCtxCreate as below:

 

   cuInit(0);

  CUdevice dev;

   int nGPUs;

   cuDeviceGetCount(&nGPUs);

   printf("Device Info: %d GPUs found in system.\n", nGPUs);

  CUcontext ctx;

   cuDeviceGet(&dev,0);       // use 1st CUDA device

   cuCtxCreate(&ctx, 0, dev); // create context for it

  CUresult memres;

   unsigned int free, total;

   memres = cuMemGetInfo(&free, &total);

   printf("After all allocation(%d):     free %d     total %d \n", memres, free, total);

  cuCtxDetach(ctx);

Hope this helps,

Jike

I’m using JCuda JNI wrapper (jcuda.org) in some Java code and what I’m going to do is: