cudaMalloc Incorrectly Reporting out of Memory Can't allocate more than 1151 MB at a time.

I am using an Nvidia Tesla C2050 on Windows 7.

Even though my card contains 2651 MB of memory, any time I try to use cudaMalloc to allocate more than 1151 MB at once, I get the error: Runtime API Error: out of memory.

Note that this only happens when I allocate more than 1151 MB using a single cudaMalloc call. If I split my request up into chunks, it works fine. In other words, this returns device out of memory.

cudaMalloc((void**) &ptr, 1152*1024*1024);

However, these two requests work fine.

cudaMalloc((void**) &ptr, 1151*1024*1024);

cudaMalloc((void**) &ptr, 1151*1024*1024);

Is there a maximum amount of device memory that can be allocated at once?

Yes, on Windows 7. On your machine, it’s 1151 MB. This is due to the memory layout scheme of WDDM. Some fixes are coming for this in later drivers.

Thank you for the reply.

Do you know of any acceptable workaround? (Other than distributing my data structure across several, noncontiguous chunks of memory.) For example, if I issue multiple cudaMalloc requests consecutively, and I am the only person using the device, can I then treat the union of those memory blocks as one large memory block?

Alternatively, I can install CentOS. Do you know if this issue effects the Linux drivers?

Linux is completely unaffected by the allocation limits. The TCC driver (which you can use with C1060 if you don’t really need decent display output as it can’t coexist with standard NV WDDM devices right now) also is unaffected by the allocation limits.

Actually I’m not using this card for display at all. But the card is a C2050, not a C1060. Will the TCC driver work with a C2050?

Actually I’m not using this card for display at all. But the card is a C2050, not a C1060. Will the TCC driver work with a C2050?

Just for reference I found:

Total system memory available for graphics use

Total amount of system memory that can be dedicated or shared to the GPU, calculated as:

TotalSystemMemoryAvailableForGraphics = MAX((TotalSystemMemory - 512) / 2), 64MB)

–It sounds like the reason they limit a single allocation to the above is because if it were larger than the amount of system memory available for the GPU, then it would be impossible for the page to fit in system memory.–

Using the TCC compute driver bypasses this page requirement and gives you full control of device memory. (At least this is my understanding)

Source:

http://download.microsoft.com/download/9/c/5/9c5b2167-8017-4bae-9fde-d599bac8184a/GraphicsMemory.doc