I am trying to allocate large amounts (several GB … up to ~10GB) of pinned memory using cudaHostAlloc and I seem to hit an unexpected limit on some machines.
The system I am having problems with:
Phenom II X6
12GB DDR3 RAM
Windows 7 64bit
Parallel Nsight 1.5
Visual Studio 2010
I expected to be able to allocate at least 8-10 GB of pinned/page-locked memory through CUDA but I seem to hit a limit at around 700 MB.
I tried allocating blocks of different sizes (e.g. all at once, many blocks of size 32, 64, 128MB …) but the limit seems to remain the same.
I also tried the latest end-user driver and some previous CUDA versions with the same effect.
My project is compiled for x64 using the v90 platform toolset.
I also followed this article to ensure that the operating system enforced limits for the non-paged pool are correct. (Process Explorer states 9.x GB as the the Nonpaged Limit)
On another machine with lower specs (PhenomX4, 4GB RAM, GTX 275, same software stack) I could at least manage to allocate around 1400MB of pinned memory which is not perfect but better.
I am trying to figure out what causes this limit and how to resolve or work around it.