Virtual memory and sparse texture limit

Is there a limit imposed by CUDA on how much virtual address space is available? It seems as if there might be a 1TB limit in CUDA despite the OS providing 128TB of virtual memory.

Ran into this when using sparse 3D textures, attempting to create a 4096x4096x4096 sparse float4 texture fails as that’s also 1TB of address space.

Code to test this is simple, the 4th call to cuMemAddressReserve here fails:

    CUdeviceptr devicePtr[4];
    size_t testSize = size_t(256)*1024*1024*1024;
    for (int i = 0; i < 4; ++i)
    {
        CUresult result = cuMemAddressReserve(&devicePtr[i], testSize, 0, 0, 0);
        printf("mem = %p, result = %d\n", (void*)devicePtr[i], result);
    }

What is the OS here? While a CUDA-specific limit is possible, it is equally likely that there are operating system-specific limits involved, some of which might be configurable.

Have you tried running with a system API logger (such as strace) to find out to which OS API calls cuMemAddressReserve()is mapped, and which (if any) of these calls fails?

This is on Windows 10. The available virtual memory reported by GlobalMemoryStatusEx is almost 128TB, and I can request multiple TB allocations using VirtualAlloc without any issue, so don’t think this is a restriction from the OS.

For further clarification (in case it makes a difference): Are you running the GPU(s) with the WDDM driver or the TCC driver?

I have not used this relatively new functionality. The way I understand it, cuMemAddressReserve is used to reserve virtual address space to which physical memory of identical size is later mapped with cuMemMap. Assuming the call to cuMemAddressReserve would succeed, is there actually enough physical memory in the system to then perform the mapping step?

Just the standard WDDM driver and not using an especially powerful GPU either, just a laptop with a 1050.

I don’t believe the physical memory has to be the identical size, with virtual memory the reserved address space can be much larger than the physical memory. The physical allocations can then be mapped in with cuMemMap to only specific regions of the reserved address space. I can reserve 768GB of virtual memory without error, only when increasing to 1TB does it fail. I have a lot less physical memory on my laptop than 768GB :)

I’m actually trying to use the new sparse texture support, which is built on top of the virtual memory system. I use cudaMalloc3DArray with the cudaArraySparse flag to create the sparse texture, and then cuMemMapArrayAsync to map the physical memory allocations to specific regions of the sparse texture. The call to cudaMalloc3DArray was failing for large texture sizes, which lead to discovering the virtual memory limit.