Unable get over 512MB of page-locked memory with cudaHostRegister or cudaMallocHost...

mikeatroc · June 29, 2012, 8:23pm

In some recent work with CUDA development I have run into trouble gaining access to more than 512MB of page locked host memory to use for asynchronous copies to/from a CUDA device (GTX 690). This seems to be an obsticle whether I allocate the RAM myself with malloc() and then call cudaHostRegister() or call cudaMallocHost() to get the page locked memory directly. FYI… the function cudaHostRegister() is being passed the flag cudaHostRegisterMapped. When cudaMallocHost is called and cumulative page-locked memory allocations cross a value of about 512MB the failure reports the error “out of memory”.

I am working with a Windows 7 64-bit OS on a computer with 32GB of RAM so the amount of physical memory is not an issue. I am using Microsoft Visual Studio 2010 and CUDA 4.2.

In trying to work through this problem, I have read through the microsoft help on process working set sizes. As an investigation, and independent of CUDA, in C++ I have set the ProcessWorkingSetSize to over 3GB, allocated 3GB of memory with malloc (in 1GB chunks) and successfully locked the three memory chunks with VirtualLock(). Therefore, I know my program has permission and the system has enough resources to supply an adequate amount of page locked memory for my problem. Note 3GB is not a limit, it is just all I asked the system to allocate and lock.

Does anyone know if there is a limitation in CUDA 4.2 on the amount of page locked memory either cudaHostRegister() or cudaMallocHost() can work with? Additionally, does anyone know if there is a way to register pre-allocated page locked host memory (say malloced and locked with VirtualLock()) with CUDA so the asynchronous copy functions can work with it? I assume but do not know if this registration is necessary to use the asynchronous copy functions.

I would like to try to avoid synchronous copies or copying smaller amounts of data at a time by looping through transferring them into page locked memory and then issuing asynchronous copies. I have large amounts of data to copy that I am already overlapping with kernel execution so either of these work arounds will cost me cycles.

Thanks in advance for any help. :^)

njuffa · June 29, 2012, 8:42pm

According to a recent post by tmurray ([url=“The Official NVIDIA Forums | NVIDIA”]The Official NVIDIA Forums | NVIDIA) it seems to me you should be able to pin chunks of up to 2GB. He also hints that this 2GB limit may be relaxed in CUDA 5.0, so if this is a possibility I would suggest trying the CUDA 5.0 preview available to registered developers. If the issue with the 512 MB limit persist even with CUDA 5.0, I would suggest filing a bug, attaching a self-contained repro case. A link to the bug reporting form can be found on the registered developer website.

mikeatroc · July 2, 2012, 4:04pm

Thanks @njuffa for your quick feedback and recognizing the problem similarity with what @twerdster experienced. I must have missed his discussion thread as I searched for issues with cudaMallocHost() and cudaHostRegister() instead of cudaHostAlloc(). I updated to cudatoolkit_5.0.7 and driver 302.59 and the problem has been resolved. Cheers.

njuffa · July 2, 2012, 6:20pm

Thanks for closing the loop. It’s good to hear the latest software fixed the issue.

Topic		Replies	Views
Maximum of page-locked memory? CUDA Programming and Performance	2	5725	August 17, 2009
Using cudaHostRegister() in CUDA 4.0 CUDA 4.0 CUDA Programming and Performance	16	30485	January 25, 2018
Arbitrary Device Limit On Pinned Host Memory CUDA Programming and Performance	8	2208	August 26, 2014
Problem with CUDA streams only part of data are being processes CUDA Programming and Performance	0	3580	December 7, 2011
Using async memcopy without using cudaMallocHost/cudaHostAlloc? CUDA Programming and Performance	3	16584	March 30, 2010
Pinned memory limit CUDA Programming and Performance	16	13958	May 1, 2016
Unexpected limit in cudaHostAlloc Failing to allocate large amounts of pinned/page-locked memory CUDA Programming and Performance	3	4230	December 6, 2010
cudaHostRegister returns cudaErrorInvalidValue CUDA Programming and Performance	14	2988	January 28, 2021
cudaHostRegister/Unregister vs Host Memcpy to Pagelocked CUDA Programming and Performance	3	2661	November 26, 2012
Maximum limit on the amount of pinned memory using cudaMallocHost() CUDA Programming and Performance	5	12431	July 10, 2010

Unable get over 512MB of page-locked memory with cudaHostRegister or cudaMallocHost...

Related topics