Nvshmem with MPS with numGpu + 1 MPI process

philippe18 · April 3, 2025, 11:26pm

I run a slightly modified version of Nvidia nvshem sample code on a server with 4 GPUs and MPS (Multi-Process Service) using MPI. I need 5 processes running in total.

On initialization of nvshmem_malloc I get the GPU to hang indefinitely.

If I start 8 processes, I get further along, but it hangs after the kernel execution.

We only use nvshmem on one node at a time over pcie / nvlink.

I am looking for pointers to solve what seems to be a race condition or wrong execution order of a collective call.

Topic		Replies	Views
NVSHMEM without mpi, 1 thread for each GPU on a node- howto initialize? GPU-Accelerated Libraries nvshmem	1	875	April 18, 2022
NVSHMEM runtime error GPU-Accelerated Libraries nvshmem	11	2004	August 16, 2022
MPI running issue using NVIDIA MPS Service on Multi-GPU nodes CUDA Programming and Performance	4	2277	September 16, 2016
Sharing 1 GPU betwenn MPI tasks work fine with 4 mpi tasks but cudaMalloc "unknown error" wi CUDA Programming and Performance	4	5978	April 10, 2009
CUDA+MPI = Unexplained Issues... Random Crashes, Errenous Output?!? CUDA Programming and Performance	5	3311	July 7, 2008
MPI causing trouble in memory allocation? CUDA Programming and Performance	5	11927	November 28, 2009
using all 4 GPUs in S1070 from multi-core cpu? how CUDA Programming and Performance	11	32496	December 13, 2010
NVSHMEM issues with synchronization GPU-Accelerated Libraries nvshmem	5	798	July 18, 2023
about running cuda on a gpu cluster CUDA Programming and Performance	25	21708	May 31, 2010
Invalid Device when using open mpi to run multiple processes Legacy PGI Compilers	1	2466	August 4, 2017

Nvshmem with MPS with numGpu + 1 MPI process

Related topics