NVSHMEM setup

Hello,

I currently am worling on a cluster with a node like so:

Running a binding script like this:

#!/bin/bash

EXE=$1
ARGS=$2
APP="$EXE $ARGS"

# This is the list of GPUs we have
GPUS=(0 1 2 3)

# This is the list of NICs we should use for each GPU
# e.g., associate GPUs 0,1 with MLX0, GPUs 2,3 with MLX1
NICS=(mlx5_0:1 mlx5_0:1 mlx5_1:1 mlx5_1:1)

# This is the list of CPU cores we should use for each GPU
# On the Ampere nodes we have 2x64 core CPUs, each organised into 4 NUMA domains
# We will use only a subset of the available NUMA domains, i.e. 1 NUMA domain per GPU
# The NUMA domain closest to each GPU can be extracted from nvidia-smi
CPUS=(48-63 16-31 112-127 80-95)

# This is the list of memory domains we should use for each GPU
MEMS=(3 1 7 5)

# Number of physical CPU cores per GPU (optional)
export OMP_NUM_THREADS=16

lrank=$OMPI_COMM_WORLD_LOCAL_RANK

export CUDA_VISIBLE_DEVICES=${GPUS[${lrank}]}
export UCX_NET_DEVICES=${NICS[${lrank}]}
numactl --physcpubind=${CPUS[${lrank}]} --membind=${MEMS[${lrank}]} $APP

I am trying to use MPI ranks to give me NVSHMEM PEs like in the documentation (Using NVSHMEM — NVSHMEM 3.0.6 documentation)
The goal is to have each GPU along with close CPU cores in an MPI ranks and NVSHMEM PE. How do I verify that this is working correctly? I have already used nsys to see that each MPI rank is running on each GPU seperately, but how do I know that they are using the correct NIC? Also have the symmeric heaps already been allocated (from the line gpu-q-74:4164305:4164305 [0] NVSHMEM INFO [0] allocated 16777216 bytes, ptr: 0x28260000000)?

Can someone run me though what the NVSHMEM start up debug means?

NVSHMEM configuration:
  CUDA API                     11040
  CUDA Runtime                 11040
  CUDA Driver                  12040
  Build Timestamp              Sep 10 2024 11:39:26
gpu-q-74:4164299:4164299 [0] NVSHMEM INFO PE distribution has been identified as NVSHMEMI_PE_DIST_BLOCK
gpu-q-74:4164299:4164299 [0] NVSHMEM INFO PE 3 (process) affinity to 16 CPUs:
    80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 
  Build Variables             
	NVSHMEM_DEBUG=OFF NVSHMEM_DEVEL=OFF NVSHMEM_DEFAULT_PMI2=OFF
	NVSHMEM_DEFAULT_PMIX=OFF NVSHMEM_DEFAULT_UCX=OFF NVSHMEM_DISABLE_COLL_POLL=ON
	NVSHMEM_ENABLE_ALL_DEVICE_INLINING=OFF NVSHMEM_GPU_COLL_USE_LDST=OFF
	NVSHMEM_IBGDA_SUPPORT=OFF NVSHMEM_IBGDA_SUPPORT_GPUMEM_ONLY=OFF
	NVSHMEM_IBDEVX_SUPPORT=OFF NVSHMEM_IBRC_SUPPORT=ON
	NVSHMEM_MPI_SUPPORT=1 NVSHMEM_NVTX=ON NVSHMEM_PMIX_SUPPORT=OFF
	NVSHMEM_SHMEM_SUPPORT=OFF NVSHMEM_TEST_STATIC_LIB=OFF
	NVSHMEM_TIMEOUT_DEVICE_POLLING=OFF NVSHMEM_TRACE=OFF
	NCCL_HOME=/usr/local/nccl
	NVSHMEM_PREFIX=/home/co-morg1/rds/hpc-work/nvshmem_sep_10
	UCX_HOME=/usr/local/software/spack/spack-rhel8-20210927/opt/spack/linux-centos8-zen2/gcc-9.4.0/ucx-1.11.1-lktqyl4gjbz36wqifl2e2wonn65xtrsr

gpu-q-74:4164305:4164305 [0] NVSHMEM INFO PE 0 (process) affinity to 16 CPUs:
    48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 
gpu-q-74:4164298:4164298 [0] NVSHMEM INFO PE distribution has been identified as NVSHMEMI_PE_DIST_BLOCK
gpu-q-74:4164298:4164298 [0] NVSHMEM INFO PE 1 (process) affinity to 16 CPUs:
    16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
gpu-q-74:4164303:4164303 [0] NVSHMEM INFO PE distribution has been identified as NVSHMEMI_PE_DIST_BLOCK
gpu-q-74:4164303:4164303 [0] NVSHMEM INFO PE 2 (process) affinity to 16 CPUs:
    112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 
gpu-q-74:4164305:4164305 [0] NVSHMEM INFO cudaDriverVersion 12040
gpu-q-74:4164298:4164298 [0] NVSHMEM INFO cudaDriverVersion 12040
gpu-q-74:4164305:4164305 [0] NVSHMEM INFO NVSHMEM symmetric heap kind = DEVICE selected
gpu-q-74:4164299:4164299 [0] NVSHMEM INFO cudaDriverVersion 12040
gpu-q-74:4164298:4164298 [0] NVSHMEM INFO NVSHMEM symmetric heap kind = DEVICE selected
gpu-q-74:4164298:4164298 [0] NVSHMEM INFO [1] nvshmemi_get_cucontext->cuCtxSynchronize->CUDA_SUCCESS) my_stream (nil)
gpu-q-74:4164298:4164298 [0] NVSHMEM INFO in get_cucontext, queried and saved context for device: 0 context: 0x36b83d0
gpu-q-74:4164305:4164305 [0] NVSHMEM INFO [0] nvshmemi_get_cucontext->cuCtxSynchronize->CUDA_SUCCESS) my_stream (nil)
gpu-q-74:4164305:4164305 [0] NVSHMEM INFO in get_cucontext, queried and saved context for device: 0 context: 0x1d418e0
gpu-q-74:4164303:4164303 [0] NVSHMEM INFO cudaDriverVersion 12040
gpu-q-74:4164303:4164303 [0] NVSHMEM INFO NVSHMEM symmetric heap kind = DEVICE selected
gpu-q-74:4164303:4164303 [0] NVSHMEM INFO [2] nvshmemi_get_cucontext->cuCtxSynchronize->CUDA_SUCCESS) my_stream (nil)
gpu-q-74:4164303:4164303 [0] NVSHMEM INFO in get_cucontext, queried and saved context for device: 0 context: 0x31e9980
gpu-q-74:4164299:4164299 [0] NVSHMEM INFO NVSHMEM symmetric heap kind = DEVICE selected
gpu-q-74:4164299:4164299 [0] NVSHMEM INFO [3] nvshmemi_get_cucontext->cuCtxSynchronize->CUDA_SUCCESS) my_stream (nil)
gpu-q-74:4164299:4164299 [0] NVSHMEM INFO in get_cucontext, queried and saved context for device: 0 context: 0x206ff70
gpu-q-74:4164305:4164305 [0] NVSHMEM INFO [0] nvshmemi_get_cucontext->cuCtxGetDevice->0(CUDA_ERROR_INVALID_CONTEXT 201) cuStreamCreateWithPriority my_stream 0x479b230
gpu-q-74:4164299:4164299 [0] NVSHMEM INFO [3] nvshmemi_get_cucontext->cuCtxGetDevice->0(CUDA_ERROR_INVALID_CONTEXT 201) cuStreamCreateWithPriority my_stream 0x4ab0730
gpu-q-74:4164298:4164298 [0] NVSHMEM INFO [1] nvshmemi_get_cucontext->cuCtxGetDevice->0(CUDA_ERROR_INVALID_CONTEXT 201) cuStreamCreateWithPriority my_stream 0x60d9480
gpu-q-74:4164303:4164303 [0] NVSHMEM INFO [2] nvshmemi_get_cucontext->cuCtxGetDevice->0(CUDA_ERROR_INVALID_CONTEXT 201) cuStreamCreateWithPriority my_stream 0x5c33e30
gpu-q-74:4164303:4164303 [0] NVSHMEM INFO nvshmemi_setup_local_heap, heapextra = 285225000
gpu-q-74:4164305:4164305 [0] NVSHMEM INFO nvshmemi_setup_local_heap, heapextra = 285225000
gpu-q-74:4164299:4164299 [0] NVSHMEM INFO nvshmemi_setup_local_heap, heapextra = 285225000
gpu-q-74:4164298:4164298 [0] NVSHMEM INFO nvshmemi_setup_local_heap, heapextra = 285225000
gpu-q-74:4164303:4164303 [0] NVSHMEM INFO NVML library found. libnvidia-ml.so
gpu-q-74:4164299:4164299 [0] NVSHMEM INFO NVML library found. libnvidia-ml.so
gpu-q-74:4164305:4164305 [0] NVSHMEM INFO NVML library found. libnvidia-ml.so
gpu-q-74:4164298:4164298 [0] NVSHMEM INFO NVML library found. libnvidia-ml.so
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/common/transport_gdr_common.cpp 73 GDR driver version: (2, 4)
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp 1635 Begin - Enumerating IB devices in the system ([<dev_id, device_name, num_ports>]) - 
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/common/transport_gdr_common.cpp 73 GDR driver version: (2, 4)
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp 1635 Begin - Enumerating IB devices in the system ([<dev_id, device_name, num_ports>]) - 
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/common/transport_gdr_common.cpp 73 GDR driver version: (2, 4)
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp 1635 Begin - Enumerating IB devices in the system ([<dev_id, device_name, num_ports>]) - 
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/common/transport_gdr_common.cpp 73 GDR driver version: (2, 4)
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp 1635 Begin - Enumerating IB devices in the system ([<dev_id, device_name, num_ports>]) - 
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp 1656 Enumerated IB devices in the system - device id=0 (of 4), name=mlx5_0, num_ports=1
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp 1656 Enumerated IB devices in the system - device id=1 (of 4), name=mlx5_1, num_ports=1
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp 1656 Enumerated IB devices in the system - device id=0 (of 4), name=mlx5_0, num_ports=1
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp 1656 Enumerated IB devices in the system - device id=0 (of 4), name=mlx5_0, num_ports=1
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp 1656 Enumerated IB devices in the system - device id=0 (of 4), name=mlx5_0, num_ports=1
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp 1656 Enumerated IB devices in the system - device id=2 (of 4), name=mlx5_2, num_ports=1
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp 1656 Enumerated IB devices in the system - device id=3 (of 4), name=mlx5_3, num_ports=1
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp 1737 End - Enumerating IB devices in the system
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp 1742 Begin - Ordered list of devices for assignment (after processing user provdied env vars (if any))  - 
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp 1746 Ordered list of devices for assignment - idx=0 (of 2), device id=0, port_num=1
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp 1746 Ordered list of devices for assignment - idx=1 (of 2), device id=1, port_num=1
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp 1750 End - Ordered list of devices for assignment (after processing user provdied env vars (if any))
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp 212 /home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp:1790 Ib Alloc Size 2097152 pointer 0x5cb8000
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp 1656 Enumerated IB devices in the system - device id=1 (of 4), name=mlx5_1, num_ports=1
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp 1656 Enumerated IB devices in the system - device id=1 (of 4), name=mlx5_1, num_ports=1
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp 1656 Enumerated IB devices in the system - device id=1 (of 4), name=mlx5_1, num_ports=1
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp 1656 Enumerated IB devices in the system - device id=2 (of 4), name=mlx5_2, num_ports=1
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp 1656 Enumerated IB devices in the system - device id=2 (of 4), name=mlx5_2, num_ports=1
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp 1656 Enumerated IB devices in the system - device id=2 (of 4), name=mlx5_2, num_ports=1
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp 1656 Enumerated IB devices in the system - device id=3 (of 4), name=mlx5_3, num_ports=1
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp 1656 Enumerated IB devices in the system - device id=3 (of 4), name=mlx5_3, num_ports=1
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp 1656 Enumerated IB devices in the system - device id=3 (of 4), name=mlx5_3, num_ports=1
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp 1737 End - Enumerating IB devices in the system
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp 1742 Begin - Ordered list of devices for assignment (after processing user provdied env vars (if any))  - 
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp 1746 Ordered list of devices for assignment - idx=0 (of 2), device id=0, port_num=1
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp 1746 Ordered list of devices for assignment - idx=1 (of 2), device id=1, port_num=1
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp 1750 End - Ordered list of devices for assignment (after processing user provdied env vars (if any))
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp 1737 End - Enumerating IB devices in the system
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp 1742 Begin - Ordered list of devices for assignment (after processing user provdied env vars (if any))  - 
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp 1746 Ordered list of devices for assignment - idx=0 (of 2), device id=0, port_num=1
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp 1746 Ordered list of devices for assignment - idx=1 (of 2), device id=1, port_num=1
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp 1750 End - Ordered list of devices for assignment (after processing user provdied env vars (if any))
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp 1737 End - Enumerating IB devices in the system
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp 1742 Begin - Ordered list of devices for assignment (after processing user provdied env vars (if any))  - 
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp 1746 Ordered list of devices for assignment - idx=0 (of 2), device id=0, port_num=1
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp 1746 Ordered list of devices for assignment - idx=1 (of 2), device id=1, port_num=1
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp 1750 End - Ordered list of devices for assignment (after processing user provdied env vars (if any))
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp 212 /home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp:1790 Ib Alloc Size 2097152 pointer 0x4819000
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp 212 /home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp:1790 Ib Alloc Size 2097152 pointer 0x6157000
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp 212 /home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp:1790 Ib Alloc Size 2097152 pointer 0x4b2f000
gpu-q-74:4164298:4164298 [0] NVSHMEM INFO NVSHMEM_ENABLE_NIC_PE_MAPPING = 0, device 0 setting dev_id = 1
gpu-q-74:4164305:4164305 [0] NVSHMEM INFO NVSHMEM_ENABLE_NIC_PE_MAPPING = 0, device 0 setting dev_id = 1
gpu-q-74:4164303:4164303 [0] NVSHMEM INFO NVSHMEM_ENABLE_NIC_PE_MAPPING = 0, device 0 setting dev_id = 0
gpu-q-74:4164299:4164299 [0] NVSHMEM INFO NVSHMEM_ENABLE_NIC_PE_MAPPING = 0, device 0 setting dev_id = 0
gpu-q-74:4164305:4164305 [0] NVSHMEM INFO [0] status 0 cudaErrorInvalidValue 1 cudaErrorInvalidSymbol 13 cudaErrorInvalidMemcpyDirection 21 cudaErrorNoKernelImageForDevice 209
gpu-q-74:4164299:4164299 [0] NVSHMEM INFO [3] status 0 cudaErrorInvalidValue 1 cudaErrorInvalidSymbol 13 cudaErrorInvalidMemcpyDirection 21 cudaErrorNoKernelImageForDevice 209
gpu-q-74:4164303:4164303 [0] NVSHMEM INFO [2] status 0 cudaErrorInvalidValue 1 cudaErrorInvalidSymbol 13 cudaErrorInvalidMemcpyDirection 21 cudaErrorNoKernelImageForDevice 209
gpu-q-74:4164298:4164298 [0] NVSHMEM INFO [1] status 0 cudaErrorInvalidValue 1 cudaErrorInvalidSymbol 13 cudaErrorInvalidMemcpyDirection 21 cudaErrorNoKernelImageForDevice 209
gpu-q-74:4164305:4164305 [0] NVSHMEM INFO calling get_mem_handle for transport: 0 buf: 0x28260000000 size: 536870912
gpu-q-74:4164298:4164298 [0] NVSHMEM INFO calling get_mem_handle for transport: 0 buf: 0x14a400000000 size: 536870912
gpu-q-74:4164303:4164303 [0] NVSHMEM INFO calling get_mem_handle for transport: 0 buf: 0x2b0e0000000 size: 536870912
gpu-q-74:4164299:4164299 [0] NVSHMEM INFO calling get_mem_handle for transport: 0 buf: 0x14d5a0000000 size: 536870912
gpu-q-74:4164305:4164305 [0] NVSHMEM INFO [0] get_mem_handle transport 0 handles 0x7ffd2713d950
gpu-q-74:4164305:4164305 [0] NVSHMEM INFO calling get_mem_handle for transport: 1 buf: 0x28260000000 size: 536870912
gpu-q-74:4164303:4164303 [0] NVSHMEM INFO [2] get_mem_handle transport 0 handles 0x7fffbc063200
gpu-q-74:4164303:4164303 [0] NVSHMEM INFO calling get_mem_handle for transport: 1 buf: 0x2b0e0000000 size: 536870912
gpu-q-74:4164298:4164298 [0] NVSHMEM INFO [1] get_mem_handle transport 0 handles 0x7fff9a32ff80
gpu-q-74:4164298:4164298 [0] NVSHMEM INFO calling get_mem_handle for transport: 1 buf: 0x14a400000000 size: 536870912
gpu-q-74:4164299:4164299 [0] NVSHMEM INFO [3] get_mem_handle transport 0 handles 0x7fff9968c4a0
gpu-q-74:4164299:4164299 [0] NVSHMEM INFO calling get_mem_handle for transport: 1 buf: 0x14d5a0000000 size: 536870912
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/common/transport_ib_common.cpp 96 ibv_reg_mr handle 0x7ffd2713db50 handle->mr (nil)
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/common/transport_ib_common.cpp 96 ibv_reg_mr handle 0x7fffbc063400 handle->mr (nil)
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp 212 /home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp:559 Ib Alloc Size 8 pointer 0x5492000
gpu-q-74:4164305:4164305 [0] NVSHMEM INFO [0] get_mem_handle transport 1 handles 0x7ffd2713db50
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/common/transport_ib_common.cpp 96 ibv_reg_mr handle 0x7fff9a330180 handle->mr (nil)
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp 212 /home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp:559 Ib Alloc Size 8 pointer 0x5c3c000
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/common/transport_ib_common.cpp 96 ibv_reg_mr handle 0x7fff9968c6a0 handle->mr (nil)
gpu-q-74:4164303:4164303 [0] NVSHMEM INFO [2] get_mem_handle transport 1 handles 0x7fffbc063400
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp 212 /home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp:559 Ib Alloc Size 8 pointer 0x6dd0000
gpu-q-74:4164298:4164298 [0] NVSHMEM INFO [1] get_mem_handle transport 1 handles 0x7fff9a330180
/home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp 212 /home/co-morg1/rds/hpc-work/nvshmem_src_2.11.0-5/src/modules/transport/ibrc/ibrc.cpp:559 Ib Alloc Size 8 pointer 0x4d4c000
gpu-q-74:4164299:4164299 [0] NVSHMEM INFO [3] get_mem_handle transport 1 handles 0x7fff9968c6a0
gpu-q-74:4164305:4164305 [0] NVSHMEM INFO [0] cuIpcOpenMemHandle fromhandle 0x70000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
gpu-q-74:4164305:4164305 [0] NVSHMEM INFO [0] cuIpcOpenMemHandle tobuf 0x2a260000000
gpu-q-74:4164299:4164299 [0] NVSHMEM INFO [3] cuIpcOpenMemHandle fromhandle 0x72000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
gpu-q-74:4164299:4164299 [0] NVSHMEM INFO [3] cuIpcOpenMemHandle tobuf 0x14f5a0000000
gpu-q-74:4164305:4164305 [0] NVSHMEM INFO [0] cuIpcOpenMemHandle fromhandle 0x72000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
gpu-q-74:4164305:4164305 [0] NVSHMEM INFO [0] cuIpcOpenMemHandle tobuf 0x2c260000000
gpu-q-74:4164298:4164298 [0] NVSHMEM INFO [1] cuIpcOpenMemHandle fromhandle 0x71000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
gpu-q-74:4164298:4164298 [0] NVSHMEM INFO [1] cuIpcOpenMemHandle tobuf 0x14c400000000
gpu-q-74:4164305:4164305 [0] NVSHMEM INFO [0] cuIpcOpenMemHandle fromhandle 0x71000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
gpu-q-74:4164305:4164305 [0] NVSHMEM INFO [0] cuIpcOpenMemHandle tobuf 0x2e260000000
gpu-q-74:4164299:4164299 [0] NVSHMEM INFO [3] cuIpcOpenMemHandle fromhandle 0x70000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
gpu-q-74:4164299:4164299 [0] NVSHMEM INFO [3] cuIpcOpenMemHandle tobuf 0x1515a0000000
gpu-q-74:4164299:4164299 [0] NVSHMEM INFO [3] cuIpcOpenMemHandle fromhandle 0x71000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
gpu-q-74:4164299:4164299 [0] NVSHMEM INFO [3] cuIpcOpenMemHandle tobuf 0x1535a0000000
gpu-q-74:4164303:4164303 [0] NVSHMEM INFO [2] cuIpcOpenMemHandle fromhandle 0x71000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
gpu-q-74:4164303:4164303 [0] NVSHMEM INFO [2] cuIpcOpenMemHandle tobuf 0x2d0e0000000
gpu-q-74:4164298:4164298 [0] NVSHMEM INFO [1] cuIpcOpenMemHandle fromhandle 0x70000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
gpu-q-74:4164298:4164298 [0] NVSHMEM INFO [1] cuIpcOpenMemHandle tobuf 0x14e400000000
gpu-q-74:4164303:4164303 [0] NVSHMEM INFO [2] cuIpcOpenMemHandle fromhandle 0x72000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
gpu-q-74:4164303:4164303 [0] NVSHMEM INFO [2] cuIpcOpenMemHandle tobuf 0x2f0e0000000
gpu-q-74:4164298:4164298 [0] NVSHMEM INFO [1] cuIpcOpenMemHandle fromhandle 0x72000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
gpu-q-74:4164298:4164298 [0] NVSHMEM INFO [1] cuIpcOpenMemHandle tobuf 0x150400000000
gpu-q-74:4164303:4164303 [0] NVSHMEM INFO [2] cuIpcOpenMemHandle fromhandle 0x70000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
gpu-q-74:4164303:4164303 [0] NVSHMEM INFO [2] cuIpcOpenMemHandle tobuf 0x310e0000000
gpu-q-74:4164305:4164305 [0] NVSHMEM INFO [0] allocated 16777216 bytes, ptr: 0x28260000000
gpu-q-74:4164298:4164298 [0] NVSHMEM INFO [1] allocated 16777216 bytes, ptr: 0x14a400000000
gpu-q-74:4164299:4164299 [0] NVSHMEM INFO [3] allocated 16777216 bytes, ptr: 0x14d5a0000000
gpu-q-74:4164303:4164303 [0] NVSHMEM INFO [2] allocated 16777216 bytes, ptr: 0x2b0e0000000
gpu-q-74:4164305:4164305 [0] NVSHMEM INFO [0] allocated 268435456 bytes, ptr: 0x28261000000
gpu-q-74:4164298:4164298 [0] NVSHMEM INFO [1] allocated 268435456 bytes, ptr: 0x14a401000000
gpu-q-74:4164303:4164303 [0] NVSHMEM INFO [2] allocated 268435456 bytes, ptr: 0x2b0e1000000
gpu-q-74:4164299:4164299 [0] NVSHMEM INFO [3] allocated 268435456 bytes, ptr: 0x14d5a1000000
gpu-q-74:4164299:4164299 [0] NVSHMEM INFO NVSHMEM_TEAM_SHARED: start=3, stride=1, size=1
gpu-q-74:4164303:4164303 [0] NVSHMEM INFO NVSHMEM_TEAM_SHARED: start=2, stride=1, size=1
gpu-q-74:4164298:4164298 [0] NVSHMEM INFO NVSHMEM_TEAM_SHARED: start=1, stride=1, size=1
gpu-q-74:4164305:4164305 [0] NVSHMEM INFO NVSHMEM_TEAM_SHARED: start=0, stride=1, size=1
gpu-q-74:4164305:4164305 [0] NVSHMEM INFO NVSHMEMX_TEAM_NODE: start=0, stride=1, size=4
gpu-q-74:4164305:4164305 [0] NVSHMEM INFO NVSHMEM_TEAM_SHARED: start=0, stride=4, size=1
gpu-q-74:4164303:4164303 [0] NVSHMEM INFO NVSHMEMX_TEAM_NODE: start=0, stride=1, size=4
gpu-q-74:4164303:4164303 [0] NVSHMEM INFO NVSHMEM_TEAM_SHARED: start=2, stride=4, size=1
gpu-q-74:4164303:4164303 [0] NVSHMEM INFO NVSHMEMI_TEAM_SAME_GPU: start=2, stride=1, size=1
gpu-q-74:4164298:4164298 [0] NVSHMEM INFO NVSHMEMX_TEAM_NODE: start=0, stride=1, size=4
gpu-q-74:4164305:4164305 [0] NVSHMEM INFO NVSHMEMI_TEAM_SAME_GPU: start=0, stride=1, size=1
gpu-q-74:4164299:4164299 [0] NVSHMEM INFO NVSHMEMX_TEAM_NODE: start=0, stride=1, size=4
gpu-q-74:4164299:4164299 [0] NVSHMEM INFO NVSHMEM_TEAM_SHARED: start=3, stride=4, size=1
gpu-q-74:4164299:4164299 [0] NVSHMEM INFO NVSHMEMI_TEAM_SAME_GPU: start=3, stride=1, size=1
gpu-q-74:4164298:4164298 [0] NVSHMEM INFO NVSHMEM_TEAM_SHARED: start=1, stride=4, size=1
gpu-q-74:4164298:4164298 [0] NVSHMEM INFO NVSHMEMI_TEAM_SAME_GPU: start=1, stride=1, size=1
gpu-q-74:4164298:4164298 [0] NVSHMEM INFO NVSHMEMI_TEAM_GPU_LEADERS: start=0, stride=1, size=4
gpu-q-74:4164305:4164305 [0] NVSHMEM INFO NVSHMEMI_TEAM_GPU_LEADERS: start=0, stride=1, size=4
gpu-q-74:4164299:4164299 [0] NVSHMEM INFO NVSHMEMI_TEAM_GPU_LEADERS: start=0, stride=1, size=4
gpu-q-74:4164303:4164303 [0] NVSHMEM INFO NVSHMEMI_TEAM_GPU_LEADERS: start=0, stride=1, size=4
gpu-q-74:4164303:4164303 [0] NVSHMEM INFO [2] allocated 128450560 bytes, ptr: 0x2b0f1000000
gpu-q-74:4164299:4164299 [0] NVSHMEM INFO [3] allocated 128450560 bytes, ptr: 0x14d5b1000000
gpu-q-74:4164305:4164305 [0] NVSHMEM INFO [0] allocated 128450560 bytes, ptr: 0x28271000000
gpu-q-74:4164298:4164298 [0] NVSHMEM INFO [1] allocated 128450560 bytes, ptr: 0x14a411000000
gpu-q-74:4164303:4164303 [0] NVSHMEM INFO [2] allocated 512 bytes, ptr: 0x2b0f8a80000
gpu-q-74:4164303:4164303 [0] NVSHMEM INFO [2] allocated 32 bytes, ptr: 0x2b0f8a80200
gpu-q-74:4164305:4164305 [0] NVSHMEM INFO [0] allocated 512 bytes, ptr: 0x28278a80000
gpu-q-74:4164303:4164303 [0] NVSHMEM INFO [2] allocated 8 bytes, ptr: 0x2b0f8a80400
gpu-q-74:4164298:4164298 [0] NVSHMEM INFO [1] allocated 512 bytes, ptr: 0x14a418a80000
gpu-q-74:4164305:4164305 [0] NVSHMEM INFO [0] allocated 32 bytes, ptr: 0x28278a80200
gpu-q-74:4164305:4164305 [0] NVSHMEM INFO [0] allocated 8 bytes, ptr: 0x28278a80400
gpu-q-74:4164298:4164298 [0] NVSHMEM INFO [1] allocated 32 bytes, ptr: 0x14a418a80200
gpu-q-74:4164299:4164299 [0] NVSHMEM INFO [3] allocated 512 bytes, ptr: 0x14d5b8a80000
gpu-q-74:4164298:4164298 [0] NVSHMEM INFO [1] allocated 8 bytes, ptr: 0x14a418a80400
gpu-q-74:4164299:4164299 [0] NVSHMEM INFO [3] allocated 32 bytes, ptr: 0x14d5b8a80200
gpu-q-74:4164299:4164299 [0] NVSHMEM INFO [3] allocated 8 bytes, ptr: 0x14d5b8a80400

For additional context the reason I am doubting the NVSHMEM setup is that I have an error in my code where all NVSHMEM PEs try to do nvshmem_double_put_nbi but fail with segmentation faults after PE 1 and 3 start this process (mutliple runs and it’s always PE 1 and 3).

I know this is a long post so the TLDR here is: “How do I use NVSHMEM_DEBUG?”