Unable to run NVSHMEM example with slurm

I’m getting started with using NVSHMEM and I wanted to start from a simple example, with not much success.

#include <nvshmem.h>
#include <stdio.h>
  
int main(int argc, char *argv[])
{
  // Initialize the NVSHMEM library
  nvshmem_init();
  
  int mype = nvshmem_my_pe();
  int npes = nvshmem_n_pes();
  
  fprintf(stdout, "PE %d of %d has started ...\n", mype, npes);
  
  // end shmem
  nvshmem_finalize();
  
  return 0;
}

Being run with the following sbatch file:

#!/bin/bash -l
#SBATCH --nodes=2                          # number of nodes
#SBATCH --ntasks=8                         # number of tasks
#SBATCH --ntasks-per-node=4                # number of tasks per node
#SBATCH --gpus-per-task=1                  # number of gpu per task
#SBATCH --cpus-per-task=1                  # number of cores per task
#SBATCH --time=00:15:00                    # time (HH:MM:SS)
#SBATCH --partition=gpu                    # partition
#SBATCH --account=p200301                  # project account
#SBATCH --qos=default                      # SLURM qos
  
module load NCCL OpenMPI CUDA NVSHMEM && nvcc -rdc=true -ccbin g++ -I $NVSHMEM_HOME/include test.cu -o test -L $NVSHMEM_HOME/lib -lnvshmem_host -lnvshmem_device -lucs -lucp && srun -n 8 ./test

The expected output would be something like:

PE 0 of 8 has started ...
PE 1 of 8 has started ...
PE 2 of 8 has started ...
.....

Instead the output I get is:

PE 0 of 1 has started ...
PE 0 of 1 has started ...
PE 0 of 1 has started ...
PE 0 of 1 has started ...
PE 0 of 1 has started ...
PE 0 of 1 has started ...
PE 0 of 1 has started ...
PE 0 of 1 has started ...

I think I am missing something crucial but simple, can somebody enlighten me?

Can you provide list of process manager interface that SLURM is compiled by using this command:srun --mpi=list ?

Likely, you are observing 8 singleton instances that are not run as part of 1 gang-scheduled job because
the default NVSHMEM bootstrap is PMI1 when using nvshmem_init and you are using SLURM with PMI2, PMIx as part of srun, which are not compatible.

You can use one of the different modalities to launch your program, as long as the right envs and build variables are specified. e.g. if srun --mpi=list returns PMIx or PMI2, you can use NVSHMEM_BOOTSTRAP_PMI=PMIX or PMI2 to override the default.

https://docs.nvidia.com/nvshmem/release-notes-install-guide/install-guide/nvshmem-install-proc.html#launching-nvshmem-programs

Hi Arnavg,

I am having the same problem as Andrew. I installed NVSHMEM on Perlmutter, which is a Cray system that uses Slurm.

I built NVSHMEM with the following related configurations:

export NVSHMEM_PMIX_SUPPORT=0
export NVSHMEM_DEFAULT_PMI2=1
export NVSHMEM_BOOTSTRAP=MPI

Running the same code with srun --mpi=pmi2 -n 4 ./test gives four lines of PE 0 of 1 has started ... as in Andrew’s output.

What am I missing? Is this problem related to the first entry here?

Thanks!

It looks like you’re setting NVSHMEM to use MPI as the bootstrap and also telling Slurm that the job will use PMI2. Valid options are:

  1. NVSHMEM_BOOTSTRAP=PMI2 NVSHMEM_BOOSTRAP_PMI=PMI2 and srun --mpi=pmi2
  2. NVSHMEM_BOOTSTRAP=MPI and srun without any --mpi option
1 Like

Thanks, Jim, for your help on a weekend. I went for the second option and it worked!