I know there is this WDDM issue that only allows for allocating blocks up to a certain size:
The maximum size of a single memory allocation created by cudaMalloc() or cuMemAlloc() on WDDM devices is limited to MIN( (System Memory Size in MB - 512 MB) / 2, PAGING_BUFFER_SEGMENT_SIZE ). For Vista, PAGING_BUFFER_SEGMENT_SIZE is approximately 2 GB.
But I don’t know if this is really my problem. I wrote a program that finds the largest memory blocks that can be allocated using cudaMalloc. I ran it on several machines, all windows 7, all with at least 8GB of system memory. So I should always be able to allocate PAGING_BUFFER_SEGMENT_SIZE.
Ok, no problems with the Teslas. On the Quadro FX5800, I could allocate ~1.8GB + ~1.8GB + ~0.3GB - fine. But on most GTX, I can only allocate blocks up to ~800MB. On the GTX680, on the other hand, I can allocate all 4GB at once?!? I can’t see any system behind this behavior.
How can I figure out what value PAGING_BUFFER_SEGMENT_SIZE has on a certain system?