Hello everyone,
I’m encountering an unexpected error when using cuMemCreate
on my system. Below are my system details:
System Environment:
- Hardware: Nvidia Jetson Orin Nano
- Software:
- Output of
/etc/nv_tegra_release
:
- Output of
# R36 (release), REVISION: 3.0, GCID: 36923193, BOARD: generic, EABI: aarch64, DATE: Fri Jul 19 23:24:25 UTC 2024
# KERNEL_VARIANT: oot
TARGET_USERSPACE_LIB_DIR=nvidia
TARGET_USERSPACE_LIB_DIR_PATH=usr/lib/aarch64-linux-gnu/nvidia
- CUDA version: CUDA 12.2
Problem:
When I try to create multiple memory handles using cuMemCreate
, I encounter the following error when the number of handles exceeds 900, (I used 2MB as the chunk size and trie to allocate 2GB, meaning allocating 1024 handles)
NVMAP_IOC_GET_FD failed: Bad address
Code Snippet:
Here’s the partial code where the error occurs:
std::vector<int> prepareGpuMemory(CUdeviceptr& d_ptr, size_t& total_chunks) {
size_t total_size = TOTAL_ALLOC_SIZE; // Predefined allocation size (1GB)
total_chunks = (total_size + CHUNK_SIZE - 1) / CHUNK_SIZE;
CUDA_CHECK(cuMemAddressReserve(&d_ptr, total_size, 0, 0, 0));
CUmemAllocationProp prop = {};
prop.type = CU_MEM_ALLOCATION_TYPE_PINNED;
prop.location.type = CU_MEM_LOCATION_TYPE_DEVICE;
prop.location.id = 0;
prop.requestedHandleTypes = CU_MEM_HANDLE_TYPE_POSIX_FILE_DESCRIPTOR;
handles.resize(total_chunks); // Initialize the global 'handles' vector
std::vector<int> shareableHandles(total_chunks);
std::cout << "[INFO] Allocating and exporting " << total_chunks
<< " memory chunks (" << TOTAL_ALLOC_SIZE / (1024 * 1024)
<< " MB in total)." << std::endl;
for (size_t i = 0; i < total_chunks; ++i) {
CUDA_CHECK(cuMemCreate(&handles[i], CHUNK_SIZE, &prop, 0)); // Assign to global 'handles'
CUDA_CHECK(cuMemExportToShareableHandle(&shareableHandles[i], handles[i], CU_MEM_HANDLE_TYPE_POSIX_FILE_DESCRIPTOR, 0));
// Progress update after every 10% of chunks are allocated
if ((i + 1) % (total_chunks / 10) == 0 || i + 1 == total_chunks) {
std::cout << "[INFO] " << (i + 1) << "/" << total_chunks
<< " chunks allocated and exported." << std::endl;
}
}
return shareableHandles;
}
Question:
Has anyone experienced a similar issue with cuMemCreate
on Jetson platforms? Is there a limitation on the number of memory handles I can create? Could this be a driver or kernel-related issue?
Any suggestions or insights would be greatly appreciated!
Thanks in advance.