I’ve noticed that cuMemAlloc sometimes fail, even though I’m requesting less memory than declared available by cuMemGetInfo. When playing around with OpenCL, I found a property that gives the maximum amount of device memory that can be allocated with one Malloc call, and on my 9800GT, it just happens to be 128MB, or 1/4 of the total global memory. After a bit of mangling, I found that the limit in CUDA is also 128MB per cuMemAlloc call.
I found no mention of this in the CUDA documentation, and no way to get information about this limit with CUDA.
Using only 128MB when the driver reports 450+MB available is at the very least frustrating/annoying, and using two or three allocations for the same buffer adds complexity to the kernel I’m unwilling to introduce.
That doesn’t gel with my experience at all. Most of my linear algebra codes use my own memory manager, and the first thing it does is make a single allocation call to reserved every last free byte from value returned by cuMemGetInfo(). On compute dedicated cards, that means 896Mb, 1Gb, or 1.8Gb in a single call. Never seen anything like that on any CUDA 2.x version on Linux.
Welcome to WDDM. There is a limit on the maximum size of a single allocation due to Vista requirements. This is fixed in the upcoming Tesla compute-only drivers for WDDM OSes.
Touche! Never imagined it would be a Windows issue, but doesn’t surprise me at all. This reminds me of the error message in my signature.
But the Tesla drivers would only work for Tesla cards, so basically, there is absolutely no way to overcome this limitation with GeForces under Windows, right?
I second that, though I believe NVIDIA would rather have us use Tesla cards. Too bad I’m just a student with no life and no money; I’d definitely get some of the Fermi-based Teslas if I could afford them.