cuFileBufRegister fails above ~256 MB on RTX A6000 (48 GB HBM) with error 5016 — how to increase the maximum registration size?

Hello,
I am using GPUDirect Storage (GDS) on a machine with an NVIDIA RTX A6000 (48 GB HBM) and Ubuntu 24.04 with kernel 6.8.0-85-generic. I am trying to pre-register a large GPU buffer with cuFileBufRegister() for use as a GPU-side block cache.

The issue:

  • Registration works for sizes up to ~256 MB.

  • Any registration size above ~256 MB fails with error 5016 (CU_FILE_INVALID_MAPPING_SIZE).

  • I attempted increasing the GDS limits in /etc/cufile.json:

"max_direct_io_size_kb": 2097152,
"max_device_cache_size_kb": 2097152,
"max_device_pinned_mem_size_kb": 33554432

but registration still fails above 256 MB.

Additional details:

  • cuFileDriverOpen() succeeds.

  • The kernel does not have an nvidia_fs module (modprobe nvidia_fs fails, and no nvidia-fs systemd service exists).

  • GDS otherwise works fine (I/O flows through cuFile and I can do direct GPU reads).

  • The system is running CUDA 12.x with the standard NVIDIA driver (same issue persists across restarts).

Question:

  • Is there a known upper limit (≈256 MB) for cuFileBufRegister() on systems without the nvidia_fs kernel module?

  • How can I properly increase the maximum GPU buffer size that can be registered with GDS on this configuration?

  • Does the absence of nvidia_fs indicate that GDS is running in a fallback / compatibility mode that enforces smaller registration limits?

  • What configuration or driver changes are required to allow registering a larger GPU buffer (e.g., 512 MB, 1 GB, or 2 GB)?

Thank you!