pre-volta MPS test failed with error: mapping of buffer object failed

Ubuntu 16.04, M60 card, start mps server by:
sudo nvidia-docker run --name mps-daemon -v nvidia_mps:/tmp/nvidia-mps --shm-size=64g -e “NVIDIA_REQUIRE_VOLTA=arch>=5.0” nvidia/mps

run client, the client failed with message: mapping of buffer object failed in cudaMalloc
Error threw out on the first CUDA function calls (allocate 100M bytes)

server logs:
Size of /dev/shm: 68719476736 bytes
CUDA_MPS_ACTIVE_THREAD_PERCENTAGE: unset
Available GPUs:
- 0, 00002C45:00:00.0, Tesla M60, Exclusive_Process
- 1, 0000FE37:00:00.0, Tesla M60, Exclusive_Process
Starting NVIDIA MPS control daemon…
[2019-06-12 08:50:20.556 Control 1] Start
[2019-06-12 08:50:26.783 Control 1] Accepting connection…
[2019-06-12 08:50:26.783 Control 1] User did not send valid credentials
[2019-06-12 08:50:26.783 Control 1] Accepting connection…
[2019-06-12 08:50:26.783 Control 1] NEW CLIENT 0 from user 0: Server is not ready, push client to pending list
[2019-06-12 08:50:26.784 Control 1] Starting new server 16 for user 0
[2019-06-12 08:50:28.610 Other 16] Start
[2019-06-12 08:50:28.933 Control 1] Accepting connection…
[2019-06-12 08:50:28.933 Control 1] NEW SERVER 16: Ready
[2019-06-12 08:50:28.933 Other 16] MPS Server: Received new client request
[2019-06-12 08:50:28.965 Other 16] MPS Server: worker created
[2019-06-12 08:50:28.965 Other 16] Client 1 disconnected

Checked host env, ulimit = unlimited, /dev/shm is 64G.
Tried sample on MPS (EXPERIMENTAL) · NVIDIA/nvidia-docker Wiki · GitHub, success.(also changed CUDA requirement)

What is root cause of this failure ? And how can I run it ?

The failure is sometimes associated with failures of peer mapping between GPUs.

As a diagnostic, does simpleP2P CUDA sample code run correctly in “bare metal” on your instance (ie. without MPS, without container)?
Does it run inside a container on your instance? (without MPS)

As a diagnostic, what is the behavior if you add:

-e NVIDIA_VISIBLE_DEVICES=0

to your nvidia-docker run command line?

Found the root cause. I missed the option:

–ipc container:mps-daemon

in docker run command line.

Now it is working. Here comes another question:
I started 3 clients with massive computing, but they are running on one card:

$nvidia-smi
Thu Jun 13 05:25:11 2019
±----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67 Driver Version: 418.67 CUDA Version: 10.1 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla M60 Off | 00002C45:00:00.0 Off | Off |
| N/A 65C P0 110W / 150W | 2866MiB / 8129MiB | 100% E. Process |
±------------------------------±---------------------±---------------------+
| 1 Tesla M60 Off | 0000FE37:00:00.0 Off | Off |
| N/A 38C P8 22W / 150W | 102MiB / 8129MiB | 0% E. Process |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 103929 C nvidia-cuda-mps-server 2600MiB |
| 1 103929 C nvidia-cuda-mps-server 91MiB |
±----------------------------------------------------------------------------+

How can I maximize the usage of 2 cards?

Thank you very much!

The usual method would be to specify the device to use. Many GPU compute applications have command options to specify which of several devices to use.