The following statement has operated correctly with CUDA 7.0, CUDA 7.5, and CUDA 8.0 release candidate (v8.0.21):
cudaHostAlloc( &p, cb, cudaHostAllocPortable | cudaHostAllocMapped );
where the function is called successively for allocations of 5368709120, 30981153888, 5368709120, and 28770267400 bytes.
With the CUDA 8.0 release build (v8.0.44), the same code fails on the second of the four memory allocations. The return value is cudaErrorMemoryAllocation (an out-of-memory error).
The environment is: Windows server 2008 R2, compiled as a 64-bit application with Visual Studio 2013.
The problem occurs on two different computers, one with three K20c devices and one with three K40 devices.
Reverting from v8.0.44 to v8.0.21 resolves the problem, so this seems to be a breaking change in the current CUDA release. Can someone please provide an explanation, a workaround, or a fix?