cuMemcreate produce NVMAP_IOC_GET_FD failed: Bad address error

Hello everyone,

I’m encountering an unexpected error when using cuMemCreate on my system. Below are my system details:


System Environment:

  • Hardware: Nvidia Jetson Orin Nano
  • Software:
    • Output of /etc/nv_tegra_release:
# R36 (release), REVISION: 3.0, GCID: 36923193, BOARD: generic, EABI: aarch64, DATE: Fri Jul 19 23:24:25 UTC 2024
# KERNEL_VARIANT: oot
TARGET_USERSPACE_LIB_DIR=nvidia  
TARGET_USERSPACE_LIB_DIR_PATH=usr/lib/aarch64-linux-gnu/nvidia
  • CUDA version: CUDA 12.2

Problem:

When I try to create multiple memory handles using cuMemCreate, I encounter the following error when the number of handles exceeds 900, (I used 2MB as the chunk size and trie to allocate 2GB, meaning allocating 1024 handles)

NVMAP_IOC_GET_FD failed: Bad address

Code Snippet:

Here’s the partial code where the error occurs:

std::vector<int> prepareGpuMemory(CUdeviceptr& d_ptr, size_t& total_chunks) {
    size_t total_size = TOTAL_ALLOC_SIZE;  // Predefined allocation size (1GB)
    total_chunks = (total_size + CHUNK_SIZE - 1) / CHUNK_SIZE;

    CUDA_CHECK(cuMemAddressReserve(&d_ptr, total_size, 0, 0, 0));

    CUmemAllocationProp prop = {};
    prop.type = CU_MEM_ALLOCATION_TYPE_PINNED;
    prop.location.type = CU_MEM_LOCATION_TYPE_DEVICE;
    prop.location.id = 0;
    prop.requestedHandleTypes = CU_MEM_HANDLE_TYPE_POSIX_FILE_DESCRIPTOR;

    handles.resize(total_chunks);  // Initialize the global 'handles' vector
    std::vector<int> shareableHandles(total_chunks);

    std::cout << "[INFO] Allocating and exporting " << total_chunks 
              << " memory chunks (" << TOTAL_ALLOC_SIZE / (1024 * 1024) 
              << " MB in total)." << std::endl;

    for (size_t i = 0; i < total_chunks; ++i) {
        CUDA_CHECK(cuMemCreate(&handles[i], CHUNK_SIZE, &prop, 0));  // Assign to global 'handles'
        CUDA_CHECK(cuMemExportToShareableHandle(&shareableHandles[i], handles[i], CU_MEM_HANDLE_TYPE_POSIX_FILE_DESCRIPTOR, 0));

        // Progress update after every 10% of chunks are allocated
        if ((i + 1) % (total_chunks / 10) == 0 || i + 1 == total_chunks) {
            std::cout << "[INFO] " << (i + 1) << "/" << total_chunks 
                      << " chunks allocated and exported." << std::endl;
        }
    }

    return shareableHandles;
}

Question:

Has anyone experienced a similar issue with cuMemCreate on Jetson platforms? Is there a limitation on the number of memory handles I can create? Could this be a driver or kernel-related issue?

Any suggestions or insights would be greatly appreciated!

Thanks in advance.


You’re going to find a lot more Jetson platform people on the Jetson forums.