Unexpected limit in cudaHostAlloc Failing to allocate large amounts of pinned/page-locked memory

Hi,

I am trying to allocate large amounts (several GB … up to ~10GB) of pinned memory using cudaHostAlloc and I seem to hit an unexpected limit on some machines.

The system I am having problems with:

    Phenom II X6

    12GB DDR3 RAM

    GTX 480

    Windows 7 64bit

    cudatoolkit_3.2.16_win_64

    devdriver_3.2_winvista-win7_64_263.06_general

    gpucomputingsdk_3.2.16_win_64

    Parallel Nsight 1.5

    Visual Studio 2010

I expected to be able to allocate at least 8-10 GB of pinned/page-locked memory through CUDA but I seem to hit a limit at around 700 MB.

I tried allocating blocks of different sizes (e.g. all at once, many blocks of size 32, 64, 128MB …) but the limit seems to remain the same.

I also tried the latest end-user driver and some previous CUDA versions with the same effect.

My project is compiled for x64 using the v90 platform toolset.

I also followed this article to ensure that the operating system enforced limits for the non-paged pool are correct. (Process Explorer states 9.x GB as the the Nonpaged Limit)

On another machine with lower specs (PhenomX4, 4GB RAM, GTX 275, same software stack) I could at least manage to allocate around 1400MB of pinned memory which is not perfect but better.

I am trying to figure out what causes this limit and how to resolve or work around it.

Kind regards

Simon

Hi,

I am trying to allocate large amounts (several GB … up to ~10GB) of pinned memory using cudaHostAlloc and I seem to hit an unexpected limit on some machines.

The system I am having problems with:

    Phenom II X6

    12GB DDR3 RAM

    GTX 480

    Windows 7 64bit

    cudatoolkit_3.2.16_win_64

    devdriver_3.2_winvista-win7_64_263.06_general

    gpucomputingsdk_3.2.16_win_64

    Parallel Nsight 1.5

    Visual Studio 2010

I expected to be able to allocate at least 8-10 GB of pinned/page-locked memory through CUDA but I seem to hit a limit at around 700 MB.

I tried allocating blocks of different sizes (e.g. all at once, many blocks of size 32, 64, 128MB …) but the limit seems to remain the same.

I also tried the latest end-user driver and some previous CUDA versions with the same effect.

My project is compiled for x64 using the v90 platform toolset.

I also followed this article to ensure that the operating system enforced limits for the non-paged pool are correct. (Process Explorer states 9.x GB as the the Nonpaged Limit)

On another machine with lower specs (PhenomX4, 4GB RAM, GTX 275, same software stack) I could at least manage to allocate around 1400MB of pinned memory which is not perfect but better.

I am trying to figure out what causes this limit and how to resolve or work around it.

Kind regards

Simon

I have the same problem. Any chance this is a bug? I have 12GB of RAM, Windows 7 64-bit. I’m using CUDA driver via CUDA.Net bindings. I’m not even getting to 1GB and I’m getting out of memory exceptions.

It is not a bug but a Windows Vista/7 “feature”.

From the release notes:

o The maximum size of a single allocation created by cudaMalloc or cuMemAlloc is limited to:
MIN ( ( System Memory Size in MB - 512 MB ) / 2, PAGING_BUFFER_SEGMENT_SIZE )
For Vista, PAGING_BUFFER_SEGMENT_SIZE is approximately 2GB.