Availability of NCCL TLS from UCC in hpcx-mpi/2.22.1 from nvhpc/25.3

Dear staff,

is it possible to use the NCCL transport layer available from UCC in HPCX-MPI?

To inspect my installation of nvhpc/25.3, I load hpcx-mpi as follows

module load /leonardo/prod/opt/compilers/nvhpc/25.3/binary/modulefiles/nvhpc-hpcx-cuda12/25.3
source /leonardo/prod/opt/compilers/nvhpc/25.3/binary/Linux_x86_64/25.3/comm_libs/12.8/hpcx/hpcx-2.22.1/hpcx-init.sh
hpcx_load.sh

The ucc is build with this tls

[$ ucc_info -b | grep “nccl”
#define UCC_CONFIGURE_FLAGS       “–with-ucx=/build-result/hpcx-v2.22.1-gcc-doca_ofed-redhat8-cuda12-x86_64/ucx --with-sharp=/build-result/hpcx-v2.22.1-gcc-doca_ofed-redhat8-cuda12-x86_64/sharp --with-rdmacm --with-tlcp=alltoall_block --with-cuda=/hpc/local/oss/cuda12.6.3/redhat8 --with-nccl --with-tls=cuda,nccl,self,sharp,shm,ucp,mlx5 --prefix=/build-result/hpcx-v2.22.1-gcc-doca_ofed-redhat8-cuda12-x86_64/ucc”

but it is not set in the configuration

[lbellen1@login01 benchmark_mpi]$ ucc_info -s
Default CLs scores: basic=10 hier=50
Default TLs scores: mlx5=1 self=50 sharp=30 shm=100 ucp=10

Might this depend on the version, or the use of NCCL TLs is not available via hpcx-mpi?

Thank you,

Laura

My mistake, I was on login node withouts gpus, and cuda and nccl layers are visible only on compute node :)

However I am still not able to use the NCCL backend, despite setting these envs

export OMPI_MCA_coll_ucc_enable=1
export OMPI_MCA_coll_ucc_priority=100
export UCC_TL_NCCL_TUNE=allgatherv:cuda:inf#bcast:cuda:inf#allreduce:cuda:inf

The UCP layer is used for cuda memory, e.g. in allreduce

[1765983400.875479] [lrdn1679:3103444:0] ucc_coll_score_map.c:203  UCC  INFO  Allreduce:
[1765983400.875479] [lrdn1679:3103444:0] ucc_coll_score_map.c:203  UCC  INFO    Host: {0..4095}:TL_SHM:10 {4K..8K}:TL_SHM:10 {8193..inf}:TL_UCP:10
[1765983400.875479] [lrdn1679:3103444:0] ucc_coll_score_map.c:203  UCC  INFO    Cuda: {0..4095}:TL_UCP:10 {4K..inf}:TL_UCP:10
[1765983400.875479] [lrdn1679:3103444:0] ucc_coll_score_map.c:203  UCC  INFO    CudaManaged: {0..4095}:TL_UCP:10 {4K..inf}:TL_UCP:10

Not sure if this is a question for the UCC forum or nvidia developer one.

Thank you,

Laura

Hi Laura,

This is out of my area so I’m not sure how helpful I’ll be.

I’ve been looking through some documentation and the best I can determine is that NCCL may be used for collectives when UCC is used (i.e. OMPI_MCA_coll=ucc) but I don’t see a way to force it to use NCCL.

Maybe there’s a way, but I’m not able to find it.

-Mat