How can I debug 'mapping of buffer object failed', which only happens on some computers?

vedxroy · January 2, 2024, 5:14am

I am compiling + running a custom allreduce implementation using MPI.
The code is here: gpu_kernels/allreduce/csrc/reference_allreduce at main · vedantroy/gpu_kernels · GitHub, but what’s important is:

I compile the code with nvcc:

nvcc -I/usr/include -I/usr/lib/x86_64-linux-gnu/openmpi/include/openmpi -I/usr/lib/x86_64-linux-gnu/openmpi/include -L/usr/lib/x86_64-linux-gnu/openmpi/lib -lmpi -L/usr/lib/x86_64-linux-gnu -lnccl -gencode=arch=$arch,code=$code "$script_dir/fast_allreduce_test.cu" -o fastallreduce_test.bin```

And I run it with the command:

mpirun  --allow-run-as-root -np 2 csrc/reference_allreduce/fastallreduce_test.bin

This command works on a conventional machine, however, when I run it on a serverless provider, like Modal (https://modal.com), first:

MPI fails with “All nodes which are allocated for this job are already filled.”
So then, I add a hostfile which fixes the issue

But, then I get the error:

WARNING: The default btl_vader_single_copy_mechanism CMA is
not available due to different user namespaces.

and the error ‘mapping of buffer object failed’ from cudaIpcOpenMemHandle.
Any thoughts on what I could do to work around this? Maybe it’s not possible.

Robert_Crovella · January 2, 2024, 3:59pm

I don’t have a wholistic answer for you. If I were working on this, the first thing I would suspect is that your host-based IPC is failing, either because you are not properly checking for errors, or else silently.

cudaIpcOpenMemHandle expects to be given a memory handle that was created in another process. First make sure you are not trying to open a mem handle that was created in the same process that you are trying to open it from (that is not allowed). Then print out the numerical value of the mem handle in the process that created the mem handle and in the process that is trying to run cudaIpcOpenMemHandle on it.

If the numerical values are the same, then there is some deeper problem that I cannot fathom.

If the numerical values are not the same, then your method for communicating the handle from one process to the other is failing, and I would focus my debug effort on that.

I have not inspected your code nor do I know anything about modal.com, so this advice may be misguided or off-base.

You could also ask modal about it, if they have some sort of support mechanism.

Another thought that occurs to me is that for CUDA IPC to work, I’m fairly certain that the GPUs hosting the buffers must be visible to all interested parties.

If you or your service is launching MPI ranks with a preamble that include e.g. CUDA_VISIBLE_DEVICES="..." you may want to check if that is a factor.

Also, if you or your service is running this in a container, there are container ipc settings that may be important. A google search will turn up some examples.

system · January 16, 2024, 4:00pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Sporadic "resource already mapped" cuda IPC CUDA Programming and Performance cuda	1	661	July 17, 2023
interprocess communication on single GPU ? CUDA Programming and Performance	1	2364	June 22, 2012
What does mapping of buffer object failed mean ? Teaching & Curriculum Support	0	1569	March 25, 2015
pre-volta MPS test failed with error: mapping of buffer object failed CUDA Programming and Performance	3	1265	June 13, 2019
cuIpcOpenMemHandle return CUDA_ERROR_INVALID_CONTEXT CUDA Programming and Performance	2	1338	July 25, 2022
cudaIpcGetMemHandle with mapped/pinned memory CUDA Programming and Performance	9	4895	April 14, 2025
Multiple cuda IPC mappings without closing previous handles causing `st.volatile.global`,`ld.volatile.global` peer-memory accesses to misbehave CUDA Programming and Performance	0	64	June 23, 2025
'invalid device ordinal' (cudaErrorInvalidDevice) CUDA Programming and Performance	6	5903	August 25, 2015
cudaDeviceEnablePeerAccess : buffer object could not be mapped CUDA Programming and Performance	0	1819	July 22, 2011
Memory increase in GPU-aware non-blocking MPI communications CUDA Programming and Performance	5	551	October 8, 2024

How can I debug 'mapping of buffer object failed', which only happens on some computers?

Related topics