Dear staff,
is it possible to use the NCCL transport layer available from UCC in HPCX-MPI?
To inspect my installation of nvhpc/25.3, I load hpcx-mpi as follows
module load /leonardo/prod/opt/compilers/nvhpc/25.3/binary/modulefiles/nvhpc-hpcx-cuda12/25.3
source /leonardo/prod/opt/compilers/nvhpc/25.3/binary/Linux_x86_64/25.3/comm_libs/12.8/hpcx/hpcx-2.22.1/hpcx-init.sh
hpcx_load.sh
The ucc is build with this tls
[$ ucc_info -b | grep “nccl”
#define UCC_CONFIGURE_FLAGS “–with-ucx=/build-result/hpcx-v2.22.1-gcc-doca_ofed-redhat8-cuda12-x86_64/ucx --with-sharp=/build-result/hpcx-v2.22.1-gcc-doca_ofed-redhat8-cuda12-x86_64/sharp --with-rdmacm --with-tlcp=alltoall_block --with-cuda=/hpc/local/oss/cuda12.6.3/redhat8 --with-nccl --with-tls=cuda,nccl,self,sharp,shm,ucp,mlx5 --prefix=/build-result/hpcx-v2.22.1-gcc-doca_ofed-redhat8-cuda12-x86_64/ucc”
but it is not set in the configuration
[lbellen1@login01 benchmark_mpi]$ ucc_info -s
Default CLs scores: basic=10 hier=50
Default TLs scores: mlx5=1 self=50 sharp=30 shm=100 ucp=10
Might this depend on the version, or the use of NCCL TLs is not available via hpcx-mpi?
Thank you,
Laura
My mistake, I was on login node withouts gpus, and cuda and nccl layers are visible only on compute node :)
However I am still not able to use the NCCL backend, despite setting these envs
export OMPI_MCA_coll_ucc_enable=1
export OMPI_MCA_coll_ucc_priority=100
export UCC_TL_NCCL_TUNE=allgatherv:cuda:inf#bcast:cuda:inf#allreduce:cuda:inf
The UCP layer is used for cuda memory, e.g. in allreduce
[1765983400.875479] [lrdn1679:3103444:0] ucc_coll_score_map.c:203 UCC INFO Allreduce:
[1765983400.875479] [lrdn1679:3103444:0] ucc_coll_score_map.c:203 UCC INFO Host: {0..4095}:TL_SHM:10 {4K..8K}:TL_SHM:10 {8193..inf}:TL_UCP:10
[1765983400.875479] [lrdn1679:3103444:0] ucc_coll_score_map.c:203 UCC INFO Cuda: {0..4095}:TL_UCP:10 {4K..inf}:TL_UCP:10
[1765983400.875479] [lrdn1679:3103444:0] ucc_coll_score_map.c:203 UCC INFO CudaManaged: {0..4095}:TL_UCP:10 {4K..inf}:TL_UCP:10
Not sure if this is a question for the UCC forum or nvidia developer one.
Thank you,
Laura
Hi Laura,
This is out of my area so I’m not sure how helpful I’ll be.
I’ve been looking through some documentation and the best I can determine is that NCCL may be used for collectives when UCC is used (i.e. OMPI_MCA_coll=ucc) but I don’t see a way to force it to use NCCL.
Maybe there’s a way, but I’m not able to find it.
-Mat