pre-volta MPS test failed with error: mapping of buffer object failed

xinfche · June 12, 2019, 9:45am

Ubuntu 16.04, M60 card, start mps server by:
sudo nvidia-docker run --name mps-daemon -v nvidia_mps:/tmp/nvidia-mps --shm-size=64g -e “NVIDIA_REQUIRE_VOLTA=arch>=5.0” nvidia/mps

run client, the client failed with message: mapping of buffer object failed in cudaMalloc
Error threw out on the first CUDA function calls (allocate 100M bytes)

server logs:
Size of /dev/shm: 68719476736 bytes
CUDA_MPS_ACTIVE_THREAD_PERCENTAGE: unset
Available GPUs:
- 0, 00002C45:00:00.0, Tesla M60, Exclusive_Process
- 1, 0000FE37:00:00.0, Tesla M60, Exclusive_Process
Starting NVIDIA MPS control daemon…
[2019-06-12 08:50:20.556 Control 1] Start
[2019-06-12 08:50:26.783 Control 1] Accepting connection…
[2019-06-12 08:50:26.783 Control 1] User did not send valid credentials
[2019-06-12 08:50:26.783 Control 1] Accepting connection…
[2019-06-12 08:50:26.783 Control 1] NEW CLIENT 0 from user 0: Server is not ready, push client to pending list
[2019-06-12 08:50:26.784 Control 1] Starting new server 16 for user 0
[2019-06-12 08:50:28.610 Other 16] Start
[2019-06-12 08:50:28.933 Control 1] Accepting connection…
[2019-06-12 08:50:28.933 Control 1] NEW SERVER 16: Ready
[2019-06-12 08:50:28.933 Other 16] MPS Server: Received new client request
[2019-06-12 08:50:28.965 Other 16] MPS Server: worker created
[2019-06-12 08:50:28.965 Other 16] Client 1 disconnected

Checked host env, ulimit = unlimited, /dev/shm is 64G.
Tried sample on MPS (EXPERIMENTAL) · NVIDIA/nvidia-docker Wiki · GitHub, success.(also changed CUDA requirement)

What is root cause of this failure ? And how can I run it ?

Robert_Crovella · June 12, 2019, 11:59am

The failure is sometimes associated with failures of peer mapping between GPUs.

As a diagnostic, does simpleP2P CUDA sample code run correctly in “bare metal” on your instance (ie. without MPS, without container)?
Does it run inside a container on your instance? (without MPS)

As a diagnostic, what is the behavior if you add:

-e NVIDIA_VISIBLE_DEVICES=0

to your nvidia-docker run command line?

xinfche · June 13, 2019, 5:34am

Found the root cause. I missed the option:

–ipc container:mps-daemon

in docker run command line.

Now it is working. Here comes another question:
I started 3 clients with massive computing, but they are running on one card:

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 103929 C nvidia-cuda-mps-server 2600MiB |
| 1 103929 C nvidia-cuda-mps-server 91MiB |
±----------------------------------------------------------------------------+

How can I maximize the usage of 2 cards?

Thank you very much!

Robert_Crovella · June 13, 2019, 12:10pm

The usual method would be to specify the device to use. Many GPU compute applications have command options to specify which of several devices to use.

Topic		Replies	Views
MPI running issue using NVIDIA MPS Service on Multi-GPU nodes CUDA Programming and Performance	4	2322	September 16, 2016
Get a Segmentation fault in MPS CUDA Programming and Performance cuda	0	83	January 22, 2025
Fail to launch CUDA-MPS CUDA Programming and Performance	9	8763	October 26, 2015
Unable to get MPS server working CUDA Programming and Performance	3	2380	February 23, 2024
MPS is not working CUDA Programming and Performance	7	3478	July 13, 2022
MPS limit on different cards CUDA Programming and Performance	1	661	July 1, 2019
MPS client failed to reserve virtual memory range at address (nil) CUDA Programming and Performance	2	923	January 11, 2020
CUDA MPS not allowing new jobs to start CUDA Setup and Installation	2	993	February 21, 2019
Issue with turn on CUDA MPS server multiuser-server mode CUDA Setup and Installation	0	179	April 6, 2025
Problems when using MPS CUDA Programming and Performance	6	2578	March 17, 2021

pre-volta MPS test failed with error: mapping of buffer object failed

Related topics