I am trying to allocate a large amount of host memory (about 17-18 GB) using cudaMallocHost. At a certain memory threshold I get a “cudaErrorMemoryAllocation” return value from cudaMallocHost. My system has 64GB of host RAM, so I am not exceeding that limit. Anything below roughly 17GB of allocation and I do not get this error. The error can be reproduced every time, so it is not random. The placement of the allocation in my code sequence does change when the error occurs, but it occurs nonetheless. For example, if I allocate the memory at the beginning of my code the allocation will be successful, but later calls to create a stream will return the same error code. If I run this same code on a machine with a GTX Titan card I do not get this error. I have tried both CUDA 8.0 and CUDA 7.5 with the GTX 1080, and they both produce the same error. I have the latest graphics driver for the GTX 1080 and I am running on Windows 10. Is there a bug with the GTX 1080, Windows 10, or something? Thanks, Sean.
cudaMallocHost() is a thin wrapper around operating system API calls, so when cudaMallocHost() refuses to allocate more memory than desired, it is likely that the reason is a failure of the underlying OS call(s).
I have to admit I have no idea, however, how the choice of GTX 1080 vs GTX Titan would make a significant difference, which is what your description implies. Best I know, Windows allocates backing store for all of the GPU memory, so the size of the GPU memory might have a slight impact.
Does Windows 10 offer OS API tracing facilities (like strace on Linux) that let you discover which OS call may be failing, and why?
Let me clarify one point. The machine with the GTX Titan is running Windows 8.1. I have yet to try the Titan card with Windows 10.
The GTX 1080 is in WDDM mode.
Is the GTX Titan in WDDM mode or TCC mode?
I believe the GTX Titan is in WDDM mode. I don’t think it can run in TCC mode without hacking it. It is not a Titan X.
Yes, just verified that it is in WDDM mode.
A controlled experiment, where only one variable (either GPU or OS) changes at a given time, could be helpful. I don’t have experience with Windows 10, but I think it uses a second-generation version of WDDM versus first-generation WDDM on Windows 7 and 8. If so, that difference sounds like a plausible working hypothesis for the cause of your observations.
The results of the test are this:
Windows 10 with GTX Titan: memory allocation error
Windows 8.1 with GTX 1080: works without error
My conclusion is there is a problem with Windows 10. The question is, now what to do? Report this to Microsoft and wait?
Another note: these systems are all multi-GPU, with 4 cards each. Perhaps I should test this with just one card and see if the error occurs.
I guess it would be best to report this to both Microsoft and NVIDIA. As far as I can see from NVIDIA’s download page, Windows10 drivers are different from Windows 7/8 drivers, so the issue could be local to NVIDIA’s Windows 10 driver. Or it could be a more fundamental flaw in the Microsoft driver model as used in Windows 10. I don’t know which working hypothesis is more likely, maybe someone more knowledgeable can give you a solid pointer in either direction.
So my suggestion would be to stick around for a couple days to see whether someone can give you better feedback.