GTX 680 fails after large cudaHostAllocaPortable allocation

twerdster · June 12, 2012, 11:10pm

I have three GTX 580s (3GB RAM each) in device slots 0 to 2 and a GTX 680 in device slot 3.

The machine Im working on has 50GB of RAM.

The following code works fine when dev = 0 to 2 and fails when dev = 3 with a memory access violation on the cudaMemGetInfo call.

Whats going on???

#include <cuda.h>

#include <cuda_runtime.h>

int main (int argc, char ** argv) 

{

	int	N =	2000*1000*1000; 

	char *arr;

	cudaError err;

	

	// Devices 0 - 2 are GTX 580. Device 3 is GTX 680

	int dev = 3 ;// 1 ;

	

	err = cudaHostAlloc(&arr, N ,cudaHostAllocPortable);	

	cudaSetDevice(dev);

		

	size_t free,total;

	err = cudaGetLastError();

	cudaMemGetInfo(&free,&total);

	

	cudaFreeHost(arr);

	return 0;

}

njuffa · June 13, 2012, 2:50am

What is the status returned by cudaSetDevice() when dev = 3? What is the version of CUDA and the driver package? What OS is running?

twerdster · June 13, 2012, 8:53am

The status is cudaSuccess after cudaHostAlloc and cudaSetDevice for all values of dev 0 to 3.

Cuda 4.2, Windows Server 2008 HPC Edition, 64 bit (Service Pack 1)

Driver version for all the cards is 301.42

If I run the code segment with dev = 3 and N less than 71810001000 it works fine

if I run it for more than that then it crashes.

This does not happen for dev = 0 to 2. Ive tested it up to N = 2010001000*1000 i.e. 20GB

Jimmy_Pettersson · June 13, 2012, 11:03am

Roughly 700-800 MB is the maximum I’ve managed to allocate on windows using non-professional cards. From what i understand this is related to the drivers and WDDM.

Here is a forum thread exhibiting the same result: The Official NVIDIA Forums | NVIDIA

I’m guessing that you’re not actually getting 2 GB on the other devices either…

From what I know the options are : Linux or pro cards…

twerdster · June 13, 2012, 12:08pm

Jimmy, im not allocating any device memory. If you look at the code youll see Im only allocating pinned host memory.

Also on both the 580 and 680 ive managed to allocate device memory of 3GB and 2GB respectively. However this

has nothing to do with my current question because Im nit allicating any device memory.

Jimmy_Pettersson · June 13, 2012, 12:35pm

That’s exactly what I’m talking about aswell. There appears to be driver related limitation to how much pinned host memory you can allocate on Windows. If you have a pro card you would have pro drivers which from what I’ve heard solves this problem.

twerdster · June 13, 2012, 12:53pm

Thanks Jimmy. Thats seems to be the case. Ive retested and in fact cudaHostAlloc returns cudaErrorMemoryAllocation for n = 2102410241024 but it works for n = 10241024*1024 - 1 i.e 2GB - 1 byte !

My code is cross platform but at the moment its easier for me to work in Visual Studio in Windows. How can I get around this problem?
Also by pro card you mean Tesla?

And finally this doesnt help me with my cudaGetMemInfo problem when using dev = 3 i.e. GTX 680.
The call still crashes the program for any cudaHostAlloc more than about 700MB …

Jimmy_Pettersson · June 13, 2012, 1:05pm

I thought you would get the TCC driver for quadro aswell but i read:

High Performance Supercomputing | NVIDIA Data Center GPUs

I suppose you could

buy tesla
Rework your memory management, maybe your kernel doesn’t need to have 2 GB of data available at each call, maybe you can segment this into several calls or streams.
Unlock your Geforce card to trick your system that it’s a professional card ( The Official NVIDIA Forums | NVIDIA , this might be difficult and painful )

twerdster · June 13, 2012, 1:15pm

Unfortunately none of those options are possible for me.

Im prepared to live with the 2GB limit for pinned memory for the time being. However when I try this with the GTX 680 as I mentioned before the cudaGetMemInfo call fails and in fact any memory allocation on the device itself also fails (for pinned host memory of 700MB or more).

I feel this has got to be a driver bug and not a way to force customers to buy Teslas because otherwise it would also be capped at the 2GB limit.

njuffa · June 13, 2012, 6:05pm

Regardless of potential allocation limitations due to the operating system or driver model, it seems to me that a cudaGetMemInfo() call should not segfault, but return a suitable error status if it cannot proceed for whatever reason. I would suggest filing a bug with a self-contained repro case. BTW, does this issue repro with just the GTX680 in the machine by any chance? That would make things easier to repro on the NVIDIA side. The bug reporting form can be reached via a link on the registered developer website. Thank you for your help.

tmurray · June 13, 2012, 6:09pm

the 2GB cap is actually a driver limitation of pre-5.0 that should be relaxed in 5.0.

twerdster · June 13, 2012, 11:31pm

I have downloaded Cuda 5.0 beta and the relevant drivers and it appears that the cudaGetMemInfo crash when using the 680 has disappeared.
Furthermore I can now allocate 2GB of pinned memory per allocation for all the cards (680 included) which is good enough for me for the moment.

@tmurray do you know how I can turn off the 2GB cap in Cuda 5.0? I would like to be able to allocate up to 4GB per pinned memory allocation or even 6GB.

Also, thanks for the help everyone. I really appreciate it!

Topic		Replies	Views
cudaMalloc3DArray out of memory can not allocate the available amount of memory CUDA Programming and Performance	3	1809	January 31, 2011
GTX580 3GB memory from Palit has any problems? CUDA Programming and Performance	13	7304	March 3, 2011
Cudamalloc attempting to allocate more memory than it is supposed to CUDA Programming and Performance cuda	15	75	January 13, 2025
Large memory allocation with CudaHostAlloc fails with CUDA 8.0 release build CUDA Programming and Performance	23	4393	January 29, 2018
GTX295 Specefications & CUDA CUDA Programming and Performance	5	12280	October 7, 2010
CUDA Memory allocation issue on GTX 850M on Windows 7 (64) CUDA Programming and Performance	5	1431	October 29, 2015
check for cudaHostAlloc Portable possibility CUDA Programming and Performance	13	2774	July 1, 2015
cudaHostAlloc can only allocate about 3.5GB of memory out of 128GB CUDA Programming and Performance	7	443	June 2, 2023
Using multiple GPUs to scale an existing Cuda application - failing to allocate memory CUDA Programming and Performance	5	1162	September 4, 2018
[Multiple GPUs / Processes] CUDA Memory De/Allocation Slow CUDA Programming and Performance	25	9552	December 4, 2017

GTX 680 fails after large cudaHostAllocaPortable allocation

Related topics