cudaMallocHost with large memory failed with invalid argument

Hi,

When I try to allocate a large chunk of host memory with cudaMallocHost, it fails with invalid argument. It can allocate smaller memory, but it seems like there is an upper limit for the allocatable memory. For example, the following mini-testing code,

#include <iostream>

int main() {
  void* pinnedMemory = nullptr;
  cudaError_t cErr;

  for (std::size_t sizeGB = 1; sizeGB <= 10; ++sizeGB) {
    std::size_t nBytes = sizeGB * (1UL << 30); // Convert GB to bytes
    cErr = cudaMallocHost(&pinnedMemory, nBytes);
    if (cErr == cudaSuccess) {
      std::cout << "Successfully allocated " << sizeGB << " GB of pinned memory.\n";
      cudaFreeHost(pinnedMemory);
      pinnedMemory = nullptr;
    } else {
      std::cout << "Failed to allocate " << sizeGB << " GB: "
        << cudaGetErrorString(cErr) << "\n";
      break;
    }
  }
  return 0;
}

results the following:

❯ nvcc test_size.cu -g -O0 -lineinfo && ./a.out
Successfully allocated 1 GB of pinned memory.
Successfully allocated 2 GB of pinned memory.
Failed to allocate 3 GB: invalid argument

And dmesg reports, when the above program exit with a failure,

❯ sudo dmesg | tail -n 20
...
[10732.400239] Cannot map memory with base addr 0x70fba6000000 and size of 0xc0000 pages

I have a sufficient amount of memory both on GPU (10GB) and CPU (64GB).

And the weirdest thing is that, I was able to allocate (at least) more than 6 GB of memory before using the cudaMallocHost, without any problem. I’m not sure if the OS upgrade affects something related to this issue or I need to consider a hardware failure.

I am using the Arch Linux Kernel version 6.11.5, Nvidia driver version is 560.35.03, and the CUDA is 12.6.

❯ uname -r
6.11.5-arch1-1

❯ nvidia-smi
Thu Oct 24 13:45:48 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03              Driver Version: 560.35.03      CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3080        Off |   00000000:0E:00.0  On |                  N/A |
|  0%   49C    P5             37W /  370W |    1102MiB /  10240MiB |     10%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

Any help, suggestions, or advice for narrowing down the issue is really welcomed.