NVHPC Code with Multiple GPUs inside Singularity Container gives UCX Error

Hi Developers,

I m trying to run my CUDA_MPI code developed within a singularity container using --nv tagbut the issue I am facing is that the code successfully runs with NPROCS=1 but when I try to run it with NPROCS=2 it gives the following error :

[node3:2233858:0:2233858] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x7f7e9fef81b0)
[node3:2233857:0:2233857] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x7fd463ef81b0)
==== backtrace (tid:2233858) ====
 0 0x0000000000014420 __funlockfile()  ???:0
 1 0x000000000018b8f5 __nss_database_lookup()  ???:0
 2 0x000000000004bf19 ucp_dt_pack()  /build-result/src/hpcx-v2.13-gcc-MLNX_OFED_LINUX-5-redhat7-cuda11-gdrcopy2-nccl2.12-x86_64/ucx-c5a185a7aeac67894abe96240f2cc52ff8df0187/src/ucp/dt/dt.c:118
 3 0x000000000007e48c ucp_tag_pack_eager_common()  /build-result/src/hpcx-v2.13-gcc-MLNX_OFED_LINUX-5-redhat7-cuda11-gdrcopy2-nccl2.12-x86_64/ucx-c5a185a7aeac67894abe96240f2cc52ff8df0187/src/ucp/tag/eager_snd.c:31
 4 0x000000000001a793 uct_mm_ep_am_common_send()  /build-result/src/hpcx-v2.13-gcc-MLNX_OFED_LINUX-5-redhat7-cuda11-gdrcopy2-nccl2.12-x86_64/ucx-c5a185a7aeac67894abe96240f2cc52ff8df0187/src/uct/sm/mm/base/mm_ep.c:326
 5 0x000000000001a793 uct_mm_ep_am_bcopy()  /build-result/src/hpcx-v2.13-gcc-MLNX_OFED_LINUX-5-redhat7-cuda11-gdrcopy2-nccl2.12-x86_64/ucx-c5a185a7aeac67894abe96240f2cc52ff8df0187/src/uct/sm/mm/base/mm_ep.c:416
 6 0x00000000000800ef uct_ep_am_bcopy()  /build-result/src/hpcx-v2.13-gcc-MLNX_OFED_LINUX-5-redhat7-cuda11-gdrcopy2-nccl2.12-x86_64/ucx-c5a185a7aeac67894abe96240f2cc52ff8df0187/src/uct/api/uct.h:3020
 7 0x00000000000800ef ucp_tag_eager_bcopy_single()  /build-result/src/hpcx-v2.13-gcc-MLNX_OFED_LINUX-5-redhat7-cuda11-gdrcopy2-nccl2.12-x86_64/ucx-c5a185a7aeac67894abe96240f2cc52ff8df0187/src/ucp/tag/eager_snd.c:132
 8 0x0000000000087f68 ucp_request_try_send()  /build-result/src/hpcx-v2.13-gcc-MLNX_OFED_LINUX-5-redhat7-cuda11-gdrcopy2-nccl2.12-x86_64/ucx-c5a185a7aeac67894abe96240f2cc52ff8df0187/src/ucp/core/ucp_request.inl:334
 9 0x0000000000087f68 ucp_request_send()  /build-result/src/hpcx-v2.13-gcc-MLNX_OFED_LINUX-5-redhat7-cuda11-gdrcopy2-nccl2.12-x86_64/ucx-c5a185a7aeac67894abe96240f2cc52ff8df0187/src/ucp/core/ucp_request.inl:357
10 0x0000000000087f68 ucp_tag_send_req()  /build-result/src/hpcx-v2.13-gcc-MLNX_OFED_LINUX-5-redhat7-cuda11-gdrcopy2-nccl2.12-x86_64/ucx-c5a185a7aeac67894abe96240f2cc52ff8df0187/src/ucp/tag/tag_send.c:116
11 0x0000000000087f68 ucp_tag_send_nbx()  /build-result/src/hpcx-v2.13-gcc-MLNX_OFED_LINUX-5-redhat7-cuda11-gdrcopy2-nccl2.12-x86_64/ucx-c5a185a7aeac67894abe96240f2cc52ff8df0187/src/ucp/tag/tag_send.c:298
12 0x00000000000047b6 mca_pml_ucx_send_nbr()  /var/jenkins/workspace/rel_nv_lib_hpcx_x86_64/rebuild_ompi/ompi/build/ompi/mca/pml/ucx/../../../../../ompi/mca/pml/ucx/pml_ucx.c:904
13 0x00000000000047b6 mca_pml_ucx_send()  /var/jenkins/workspace/rel_nv_lib_hpcx_x86_64/rebuild_ompi/ompi/build/ompi/mca/pml/ucx/../../../../../ompi/mca/pml/ucx/pml_ucx.c:944
14 0x0000000000072ac5 PMPI_Sendrecv()  /var/jenkins/workspace/rel_nv_lib_hpcx_x86_64/rebuild_ompi/ompi/build/ompi/mpi/c/profile/psendrecv.c:91
15 0x0000000000405465 main()  /source/KKS_FD_CUDA_MPI/./microsim_kks_fd_cuda_mpi.c:443
16 0x0000000000024083 __libc_start_main()  ???:0
17 0x000000000040366e _start()  ???:0
=================================
[node3:2233858] *** Process received signal ***
[node3:2233858] Signal: Segmentation fault (11)
[node3:2233858] Signal code:  (-6)
[node3:2233858] Failing at address: 0x40200221602
[node3:2233858] [ 0] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f8045f42420]
[node3:2233858] [ 1] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x18b8f5)[0x7f804584a8f5]
[node3:2233858] [ 2] /opt/nvidia/hpc_sdk/Linux_x86_64/23.1/comm_libs/hpcx/latest/ucx/mt/lib/libucp.so.0(ucp_dt_pack+0x99)[0x7f80400fff19]
[node3:2233858] [ 3] /opt/nvidia/hpc_sdk/Linux_x86_64/23.1/comm_libs/hpcx/latest/ucx/mt/lib/libucp.so.0(+0x7e48c)[0x7f804013248c]
[node3:2233858] [ 4] /opt/nvidia/hpc_sdk/Linux_x86_64/23.1/comm_libs/hpcx/latest/ucx/mt/lib/libuct.so.0(uct_mm_ep_am_bcopy+0x133)[0x7f8040095793]
[node3:2233858] [ 5] /opt/nvidia/hpc_sdk/Linux_x86_64/23.1/comm_libs/hpcx/latest/ucx/mt/lib/libucp.so.0(+0x800ef)[0x7f80401340ef]
[node3:2233858] [ 6] /opt/nvidia/hpc_sdk/Linux_x86_64/23.1/comm_libs/hpcx/latest/ucx/mt/lib/libucp.so.0(ucp_tag_send_nbx+0x7d8)[0x7f804013bf68]
[node3:2233858] [ 7] /opt/nvidia/hpc_sdk/Linux_x86_64/23.1/comm_libs/hpcx/hpcx-2.13/ompi/lib/openmpi/mca_pml_ucx.so(mca_pml_ucx_send+0xf6)[0x7f80179c87b6]
[node3:2233858] [ 8] /opt/nvidia/hpc_sdk/Linux_x86_64/23.1/comm_libs/hpcx/hpcx-2.13/ompi/lib/libmpi.so.40(MPI_Sendrecv+0x95)[0x7f80470b4ac5]
[node3:2233858] [ 9] ./microsim_kks_fd_cuda_mpi[0x405465]
[node3:2233858] [10] /usr/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7f80456e3083]
[node3:2233858] [11] ./microsim_kks_fd_cuda_mpi[0x40366e]
[node3:2233858] *** End of error message ***
==== backtrace (tid:2233857) ====
 0 0x0000000000014420 __funlockfile()  ???:0
 1 0x000000000018b8f5 __nss_database_lookup()  ???:0
 2 0x000000000004bf19 ucp_dt_pack()  /build-result/src/hpcx-v2.13-gcc-MLNX_OFED_LINUX-5-redhat7-cuda11-gdrcopy2-nccl2.12-x86_64/ucx-c5a185a7aeac67894abe96240f2cc52ff8df0187/src/ucp/dt/dt.c:118
 3 0x000000000007e48c ucp_tag_pack_eager_common()  /build-result/src/hpcx-v2.13-gcc-MLNX_OFED_LINUX-5-redhat7-cuda11-gdrcopy2-nccl2.12-x86_64/ucx-c5a185a7aeac67894abe96240f2cc52ff8df0187/src/ucp/tag/eager_snd.c:31
 4 0x000000000001a793 uct_mm_ep_am_common_send()  /build-result/src/hpcx-v2.13-gcc-MLNX_OFED_LINUX-5-redhat7-cuda11-gdrcopy2-nccl2.12-x86_64/ucx-c5a185a7aeac67894abe96240f2cc52ff8df0187/src/uct/sm/mm/base/mm_ep.c:326
 5 0x000000000001a793 uct_mm_ep_am_bcopy()  /build-result/src/hpcx-v2.13-gcc-MLNX_OFED_LINUX-5-redhat7-cuda11-gdrcopy2-nccl2.12-x86_64/ucx-c5a185a7aeac67894abe96240f2cc52ff8df0187/src/uct/sm/mm/base/mm_ep.c:416
 6 0x00000000000800ef uct_ep_am_bcopy()  /build-result/src/hpcx-v2.13-gcc-MLNX_OFED_LINUX-5-redhat7-cuda11-gdrcopy2-nccl2.12-x86_64/ucx-c5a185a7aeac67894abe96240f2cc52ff8df0187/src/uct/api/uct.h:3020
 7 0x00000000000800ef ucp_tag_eager_bcopy_single()  /build-result/src/hpcx-v2.13-gcc-MLNX_OFED_LINUX-5-redhat7-cuda11-gdrcopy2-nccl2.12-x86_64/ucx-c5a185a7aeac67894abe96240f2cc52ff8df0187/src/ucp/tag/eager_snd.c:132
 8 0x0000000000087f68 ucp_request_try_send()  /build-result/src/hpcx-v2.13-gcc-MLNX_OFED_LINUX-5-redhat7-cuda11-gdrcopy2-nccl2.12-x86_64/ucx-c5a185a7aeac67894abe96240f2cc52ff8df0187/src/ucp/core/ucp_request.inl:334
 9 0x0000000000087f68 ucp_request_send()  /build-result/src/hpcx-v2.13-gcc-MLNX_OFED_LINUX-5-redhat7-cuda11-gdrcopy2-nccl2.12-x86_64/ucx-c5a185a7aeac67894abe96240f2cc52ff8df0187/src/ucp/core/ucp_request.inl:357
10 0x0000000000087f68 ucp_tag_send_req()  /build-result/src/hpcx-v2.13-gcc-MLNX_OFED_LINUX-5-redhat7-cuda11-gdrcopy2-nccl2.12-x86_64/ucx-c5a185a7aeac67894abe96240f2cc52ff8df0187/src/ucp/tag/tag_send.c:116
11 0x0000000000087f68 ucp_tag_send_nbx()  /build-result/src/hpcx-v2.13-gcc-MLNX_OFED_LINUX-5-redhat7-cuda11-gdrcopy2-nccl2.12-x86_64/ucx-c5a185a7aeac67894abe96240f2cc52ff8df0187/src/ucp/tag/tag_send.c:298
12 0x00000000000047b6 mca_pml_ucx_send_nbr()  /var/jenkins/workspace/rel_nv_lib_hpcx_x86_64/rebuild_ompi/ompi/build/ompi/mca/pml/ucx/../../../../../ompi/mca/pml/ucx/pml_ucx.c:904
13 0x00000000000047b6 mca_pml_ucx_send()  /var/jenkins/workspace/rel_nv_lib_hpcx_x86_64/rebuild_ompi/ompi/build/ompi/mca/pml/ucx/../../../../../ompi/mca/pml/ucx/pml_ucx.c:944
14 0x0000000000072ac5 PMPI_Sendrecv()  /var/jenkins/workspace/rel_nv_lib_hpcx_x86_64/rebuild_ompi/ompi/build/ompi/mpi/c/profile/psendrecv.c:91
15 0x0000000000405465 main()  /source/KKS_FD_CUDA_MPI/./microsim_kks_fd_cuda_mpi.c:443
16 0x0000000000024083 __libc_start_main()  ???:0
17 0x000000000040366e _start()  ???:0
=================================
[node3:2233857] *** Process received signal ***
[node3:2233857] Signal: Segmentation fault (11)
[node3:2233857] Signal code:  (-6)
[node3:2233857] Failing at address: 0x40200221601
[node3:2233857] [ 0] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7fd609836420]
[node3:2233857] [ 1] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x18b8f5)[0x7fd60913e8f5]
[node3:2233857] [ 2] /opt/nvidia/hpc_sdk/Linux_x86_64/23.1/comm_libs/hpcx/latest/ucx/mt/lib/libucp.so.0(ucp_dt_pack+0x99)[0x7fd5f0097f19]
[node3:2233857] [ 3] /opt/nvidia/hpc_sdk/Linux_x86_64/23.1/comm_libs/hpcx/latest/ucx/mt/lib/libucp.so.0(+0x7e48c)[0x7fd5f00ca48c]
[node3:2233857] [ 4] /opt/nvidia/hpc_sdk/Linux_x86_64/23.1/comm_libs/hpcx/latest/ucx/mt/lib/libuct.so.0(uct_mm_ep_am_bcopy+0x133)[0x7fd604059793]
[node3:2233857] [ 5] /opt/nvidia/hpc_sdk/Linux_x86_64/23.1/comm_libs/hpcx/latest/ucx/mt/lib/libucp.so.0(+0x800ef)[0x7fd5f00cc0ef]
[node3:2233857] [ 6] /opt/nvidia/hpc_sdk/Linux_x86_64/23.1/comm_libs/hpcx/latest/ucx/mt/lib/libucp.so.0(ucp_tag_send_nbx+0x7d8)[0x7fd5f00d3f68]
[node3:2233857] [ 7] /opt/nvidia/hpc_sdk/Linux_x86_64/23.1/comm_libs/hpcx/hpcx-2.13/ompi/lib/openmpi/mca_pml_ucx.so(mca_pml_ucx_send+0xf6)[0x7fd5d75ba7b6]
[node3:2233857] [ 8] /opt/nvidia/hpc_sdk/Linux_x86_64/23.1/comm_libs/hpcx/hpcx-2.13/ompi/lib/libmpi.so.40(MPI_Sendrecv+0x95)[0x7fd60a9a8ac5]
[node3:2233857] [ 9] ./microsim_kks_fd_cuda_mpi[0x405465]
[node3:2233857] [10] /usr/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7fd608fd7083]
[node3:2233857] [11] ./microsim_kks_fd_cuda_mpi[0x40366e]
[node3:2233857] *** End of error message ***
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 0 on node node3 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
make: *** [Makefile:171: run] Error 139

I have 4 A100 GPUs which I intend to use instead of simply 1 so it will be quite helpful if you could provide me with some headers regarding this issue.

Many Thanks
Pushkar