GTX 680 fails after large cudaHostAllocaPortable allocation

I have three GTX 580s (3GB RAM each) in device slots 0 to 2 and a GTX 680 in device slot 3.

The machine Im working on has 50GB of RAM.

The following code works fine when dev = 0 to 2 and fails when dev = 3 with a memory access violation on the cudaMemGetInfo call.

Whats going on???

#include <cuda.h>

#include <cuda_runtime.h>

int main (int argc, char ** argv) 

{

	int	N =	2000*1000*1000; 

	char *arr;

	cudaError err;

	

	// Devices 0 - 2 are GTX 580. Device 3 is GTX 680

	int dev = 3 ;// 1 ;

	

	err = cudaHostAlloc(&arr, N ,cudaHostAllocPortable);	

	cudaSetDevice(dev);

		

	size_t free,total;

	err = cudaGetLastError();

	cudaMemGetInfo(&free,&total);

	

	cudaFreeHost(arr);

	return 0;

}

What is the status returned by cudaSetDevice() when dev = 3? What is the version of CUDA and the driver package? What OS is running?

The status is cudaSuccess after cudaHostAlloc and cudaSetDevice for all values of dev 0 to 3.

Cuda 4.2, Windows Server 2008 HPC Edition, 64 bit (Service Pack 1)

Driver version for all the cards is 301.42

If I run the code segment with dev = 3 and N less than 71810001000 it works fine

if I run it for more than that then it crashes.

This does not happen for dev = 0 to 2. Ive tested it up to N = 2010001000*1000 i.e. 20GB

Roughly 700-800 MB is the maximum I’ve managed to allocate on windows using non-professional cards. From what i understand this is related to the drivers and WDDM.

Here is a forum thread exhibiting the same result: The Official NVIDIA Forums | NVIDIA

I’m guessing that you’re not actually getting 2 GB on the other devices either…

From what I know the options are : Linux or pro cards…

Jimmy, im not allocating any device memory. If you look at the code youll see Im only allocating pinned host memory.

Also on both the 580 and 680 ive managed to allocate device memory of 3GB and 2GB respectively. However this

has nothing to do with my current question because Im nit allicating any device memory.

That’s exactly what I’m talking about aswell. There appears to be driver related limitation to how much pinned host memory you can allocate on Windows. If you have a pro card you would have pro drivers which from what I’ve heard solves this problem.

Thanks Jimmy. Thats seems to be the case. Ive retested and in fact cudaHostAlloc returns cudaErrorMemoryAllocation for n = 2102410241024 but it works for n = 10241024*1024 - 1 i.e 2GB - 1 byte !

My code is cross platform but at the moment its easier for me to work in Visual Studio in Windows. How can I get around this problem?
Also by pro card you mean Tesla?

And finally this doesnt help me with my cudaGetMemInfo problem when using dev = 3 i.e. GTX 680.
The call still crashes the program for any cudaHostAlloc more than about 700MB …

I thought you would get the TCC driver for quadro aswell but i read:

I suppose you could

  1. buy tesla

  2. Rework your memory management, maybe your kernel doesn’t need to have 2 GB of data available at each call, maybe you can segment this into several calls or streams.

  3. Unlock your Geforce card to trick your system that it’s a professional card ( The Official NVIDIA Forums | NVIDIA , this might be difficult and painful )

Unfortunately none of those options are possible for me.

Im prepared to live with the 2GB limit for pinned memory for the time being. However when I try this with the GTX 680 as I mentioned before the cudaGetMemInfo call fails and in fact any memory allocation on the device itself also fails (for pinned host memory of 700MB or more).

I feel this has got to be a driver bug and not a way to force customers to buy Teslas because otherwise it would also be capped at the 2GB limit.

Regardless of potential allocation limitations due to the operating system or driver model, it seems to me that a cudaGetMemInfo() call should not segfault, but return a suitable error status if it cannot proceed for whatever reason. I would suggest filing a bug with a self-contained repro case. BTW, does this issue repro with just the GTX680 in the machine by any chance? That would make things easier to repro on the NVIDIA side. The bug reporting form can be reached via a link on the registered developer website. Thank you for your help.

the 2GB cap is actually a driver limitation of pre-5.0 that should be relaxed in 5.0.

I have downloaded Cuda 5.0 beta and the relevant drivers and it appears that the cudaGetMemInfo crash when using the 680 has disappeared.
Furthermore I can now allocate 2GB of pinned memory per allocation for all the cards (680 included) which is good enough for me for the moment.

@tmurray do you know how I can turn off the 2GB cap in Cuda 5.0? I would like to be able to allocate up to 4GB per pinned memory allocation or even 6GB.

Also, thanks for the help everyone. I really appreciate it!