Large memory allocation with CudaHostAlloc fails with CUDA 8.0 release build

The following statement has operated correctly with CUDA 7.0, CUDA 7.5, and CUDA 8.0 release candidate (v8.0.21):

cudaHostAlloc( &p, cb, cudaHostAllocPortable | cudaHostAllocMapped );

where the function is called successively for allocations of 5368709120, 30981153888, 5368709120, and 28770267400 bytes.

With the CUDA 8.0 release build (v8.0.44), the same code fails on the second of the four memory allocations. The return value is cudaErrorMemoryAllocation (an out-of-memory error).

The environment is: Windows server 2008 R2, compiled as a 64-bit application with Visual Studio 2013.

The problem occurs on two different computers, one with three K20c devices and one with three K40 devices.

Reverting from v8.0.44 to v8.0.21 resolves the problem, so this seems to be a breaking change in the current CUDA release. Can someone please provide an explanation, a workaround, or a fix?

Thank you!

You might want to file a bug at

anything new to this issue, it seems that i run into the same problem…

No response from NVidia.

As far as I’m concerned there’s no compelling reason to use CUDA 8 anyway, so we just plan to wait until the next CUDA update and see if the problem gets fixed.

If you want to try to shake NVidia’s tree about this, I suppose you could submit your own bug report. Feel free to reference “NVIDIA Incident Report (1824366) - Large memory allocation with CudaHostAlloc fails with CUDA 8.0 release build” if you think it will help.

Regarding this:

I took a look at the internal bug report 1824366. There were two auto-generated emails sent from that bug report to the email address that was used to set up the registered developer credentials. Those emails were sent on 10/12/2016 and 10/31/2016, both were effectively requesting that you provide a complete reproducer code.

There doesn’t appear to be any reply. The best way to provide such a reply would be to go into the bug reporting portal and provide the requested information (a complete reproducer code).

In general, when filing bug reports, if you don’t provide a reproducer code, the probability that someone will look at it is drastically reduced.

If you believe there may be an email issue, and you want to provide a complete reproducer code here, I will edit it into the existing bug report.

Except in a general way such as above, I don’t use information in bug reports without permission, so if you’d like me to contact you directly using the email in the bug report, please advise, and I will do so.

The “reproducer code” – basically just a line of C that calls CudaHostAlloc in exactly the way described in this forum thread – was emailed three times (October 13, October 31, November 15) to If “there doesn’t appear to be any reply” then perhaps there is indeed an email problem somewhere.

Anyway, let’s try it again here:

  1. Use Visual Studio to create a new CUDA 8.0 project. Compile as a 64-bit (“x64”) application.
  2. Insert the following into main():
unsigned long long * p = NULL;
unsigned long long cb = 5368709120;
cudaError_t rval = cudaHostAlloc( &p, cb, cudaHostAllocPortable | cudaHostAllocMapped );
printf( "cudaHostAlloc( ..., %llu, ... ) returns %d\n", cb, rval );

cb = 30981153888;
rval = cudaHostAlloc( &p, cb, cudaHostAllocPortable | cudaHostAllocMapped );
printf( "cudaHostAlloc( ..., %llu, ... ) returns %d\n", cb, rval );

The first call to cudaHostAlloc succeeds. The second fails.

I also tried to report a bug but it seems that i am too stupid for that. I filled out the mask 2 times and when i pressed the final report button the webside responded with an error code (and my 10 minute mask entry was gone), lol. Seems that the bug report side is buggy…

Do you have the v8.0.21 package? would it be possible to get it from you since it is not downable any more?

I believe I am also seeing the same memory allocation problems. I am using Cuda 7.5.18 which installs the 353.90 driver and cudaHostAlloc seems to work fine.

After upgrading the driver to 369.73 ( I also tried 376.33) the allocations errors seem to show up.
My system has 256 GB of Ram and 4 Tesla K80 Cards.

Is there any new status on this ?


Sorry. Still no response from NVidia.

Have you tried completing removing the driver with DDU in safe-mode( before installing Cuda 8.0 ?

I was gonna say, I’m kind of surprised by how iffy Nvidia’s web design is lol. I was like, man, you can tell this was coded by developers :P

Same here with CUDA 8 drivers (369.30 from toolkit and current 376.33) on Windows Server 2012 R2. Only 38 of 256 GB can be allocated via cudaHostAlloc with CUDA 8 and CUDA 7.5.

Latest driver 376.84 still does not fix it. It is now 4 months since the issue was reported.

October 6, 2016 till now is about four months, not eight. Am I missing something?

The longer delay is probably an indication that it is a hard problem to fix. If you are a customer with a designated contact at NVIDIA, bring it up with them. Otherwise, consider filing a bug report that is specific to your use case or application. Sometimes, what looks like the same issue superficially, may be a previously unknown variant of a problem already reported, or actually have a different root cause.

In Linux, using CUDA 8.0 and driver 375.26, this fails for me as well.

But not because of a bug… I suspect it’s because I only have 32 GB of system RAM and the CUDA mapped memory is not allowed to be pagable and therefore total allocations are limited to that system memory size. If I skip the first allocation, the second succeeds. Or I can reduce the second allocation down to 25GB and it succeeds as well.

I love 2017, where I casually say “I only have 32GB of RAM”…

:-) :-) Especially when one remembers times when mass storage (hard disks) had less capacity than that. I still have two Conner 3204F disks (in working order), each with a whopping 200 MB of storage …


anyone testet that bug on newest release? does it still remain?

Latest driver 376.84 still does not fix it. It is now 6 months since the issue was reported.

thank you nvidia that i bought a 8 TitanX MultiGPU server with 128GB RAM and can only use driver from prior century to use them… i am really beginning to hate nvidia

Issues were corrected. Post edited out.