I am trying to allocate large amounts (several GB … up to ~10GB) of pinned memory using cudaHostAlloc and I seem to hit an unexpected limit on some machines.
The system I am having problems with:
[*]Phenom II X6
[*]12GB DDR3 RAM
[*]GTX 480
[*]Windows 7 64bit
[*]cudatoolkit_3.2.16_win_64
[*]devdriver_3.2_winvista-win7_64_263.06_general
[*]gpucomputingsdk_3.2.16_win_64
[*]Parallel Nsight 1.5
[*]Visual Studio 2010
I expected to be able to allocate at least 8-10 GB of pinned/page-locked memory through CUDA but I seem to hit a limit at around 700 MB.
I tried allocating blocks of different sizes (e.g. all at once, many blocks of size 32, 64, 128MB …) but the limit seems to remain the same.
I also tried the latest end-user driver and some previous CUDA versions with the same effect.
My project is compiled for x64 using the v90 platform toolset.
I also followed this article to ensure that the operating system enforced limits for the non-paged pool are correct. (Process Explorer states 9.x GB as the the Nonpaged Limit)
On another machine with lower specs (PhenomX4, 4GB RAM, GTX 275, same software stack) I could at least manage to allocate around 1400MB of pinned memory which is not perfect but better.
I am trying to figure out what causes this limit and how to resolve or work around it.
I am trying to allocate large amounts (several GB … up to ~10GB) of pinned memory using cudaHostAlloc and I seem to hit an unexpected limit on some machines.
The system I am having problems with:
[*]Phenom II X6
[*]12GB DDR3 RAM
[*]GTX 480
[*]Windows 7 64bit
[*]cudatoolkit_3.2.16_win_64
[*]devdriver_3.2_winvista-win7_64_263.06_general
[*]gpucomputingsdk_3.2.16_win_64
[*]Parallel Nsight 1.5
[*]Visual Studio 2010
I expected to be able to allocate at least 8-10 GB of pinned/page-locked memory through CUDA but I seem to hit a limit at around 700 MB.
I tried allocating blocks of different sizes (e.g. all at once, many blocks of size 32, 64, 128MB …) but the limit seems to remain the same.
I also tried the latest end-user driver and some previous CUDA versions with the same effect.
My project is compiled for x64 using the v90 platform toolset.
I also followed this article to ensure that the operating system enforced limits for the non-paged pool are correct. (Process Explorer states 9.x GB as the the Nonpaged Limit)
On another machine with lower specs (PhenomX4, 4GB RAM, GTX 275, same software stack) I could at least manage to allocate around 1400MB of pinned memory which is not perfect but better.
I am trying to figure out what causes this limit and how to resolve or work around it.
I have the same problem. Any chance this is a bug? I have 12GB of RAM, Windows 7 64-bit. I’m using CUDA driver via CUDA.Net bindings. I’m not even getting to 1GB and I’m getting out of memory exceptions.
o The maximum size of a single allocation created by cudaMalloc or cuMemAlloc is limited to:
MIN ( ( System Memory Size in MB - 512 MB ) / 2, PAGING_BUFFER_SEGMENT_SIZE )
For Vista, PAGING_BUFFER_SEGMENT_SIZE is approximately 2GB.