Availability of NCCL TLS from UCC in hpcx-mpi/2.22.1 from nvhpc/25.3

l.bellentani · December 17, 2025, 10:20am

Dear staff,

is it possible to use the NCCL transport layer available from UCC in HPCX-MPI?

To inspect my installation of nvhpc/25.3, I load hpcx-mpi as follows

module load /leonardo/prod/opt/compilers/nvhpc/25.3/binary/modulefiles/nvhpc-hpcx-cuda12/25.3
source /leonardo/prod/opt/compilers/nvhpc/25.3/binary/Linux_x86_64/25.3/comm_libs/12.8/hpcx/hpcx-2.22.1/hpcx-init.sh
hpcx_load.sh

The ucc is build with this tls

[$ ucc_info -b | grep “nccl”
#define UCC_CONFIGURE_FLAGS       “–with-ucx=/build-result/hpcx-v2.22.1-gcc-doca_ofed-redhat8-cuda12-x86_64/ucx --with-sharp=/build-result/hpcx-v2.22.1-gcc-doca_ofed-redhat8-cuda12-x86_64/sharp --with-rdmacm --with-tlcp=alltoall_block --with-cuda=/hpc/local/oss/cuda12.6.3/redhat8 --with-nccl --with-tls=cuda,nccl,self,sharp,shm,ucp,mlx5 --prefix=/build-result/hpcx-v2.22.1-gcc-doca_ofed-redhat8-cuda12-x86_64/ucc”

but it is not set in the configuration

[lbellen1@login01 benchmark_mpi]$ ucc_info -s
Default CLs scores: basic=10 hier=50
Default TLs scores: mlx5=1 self=50 sharp=30 shm=100 ucp=10

Might this depend on the version, or the use of NCCL TLs is not available via hpcx-mpi?

Thank you,

Laura

l.bellentani · December 17, 2025, 3:05pm

My mistake, I was on login node withouts gpus, and cuda and nccl layers are visible only on compute node :)

However I am still not able to use the NCCL backend, despite setting these envs

export OMPI_MCA_coll_ucc_enable=1
export OMPI_MCA_coll_ucc_priority=100
export UCC_TL_NCCL_TUNE=allgatherv:cuda:inf#bcast:cuda:inf#allreduce:cuda:inf

The UCP layer is used for cuda memory, e.g. in allreduce

[1765983400.875479] [lrdn1679:3103444:0] ucc_coll_score_map.c:203  UCC  INFO  Allreduce:
[1765983400.875479] [lrdn1679:3103444:0] ucc_coll_score_map.c:203  UCC  INFO    Host: {0..4095}:TL_SHM:10 {4K..8K}:TL_SHM:10 {8193..inf}:TL_UCP:10
[1765983400.875479] [lrdn1679:3103444:0] ucc_coll_score_map.c:203  UCC  INFO    Cuda: {0..4095}:TL_UCP:10 {4K..inf}:TL_UCP:10
[1765983400.875479] [lrdn1679:3103444:0] ucc_coll_score_map.c:203  UCC  INFO    CudaManaged: {0..4095}:TL_UCP:10 {4K..inf}:TL_UCP:10

Not sure if this is a question for the UCC forum or nvidia developer one.

Thank you,

Laura

MatColgrove · December 17, 2025, 8:26pm

Hi Laura,

This is out of my area so I’m not sure how helpful I’ll be.

I’ve been looking through some documentation and the best I can determine is that NCCL may be used for collectives when UCC is used (i.e. OMPI_MCA_coll=ucc) but I don’t see a way to force it to use NCCL.

Maybe there’s a way, but I’m not able to find it.

-Mat

Topic		Replies	Views
Program compiled with HPCX failed to use NVLink in NCCL function nvc, nvc++ and nvfortran hpc-x	1	127	March 3, 2025
Unable to make nccl work Container: HPC	0	325	December 20, 2023
OS X CUDA 9.0 missing nccl CUDA Programming and Performance	0	832	December 16, 2017
Nccl version missmatch causes multi-gpu training freeze CUDA Setup and Installation cuda , ubuntu , pytorch , python	0	1005	February 11, 2022
NCCL Version GPU-Accelerated Libraries	1	983	July 18, 2022
Fast Multi-GPU collectives with NCCL Technical Blog	14	1251	May 11, 2018
Cannot find NCCL libnccl-net.so file CUDA Setup and Installation cuda	0	1387	November 9, 2023
NCCL error GPU-Accelerated Libraries	4	392	February 19, 2025
Problems installing nccl on Ubuntu 22.04 Linux	1	1197	January 18, 2024
Is there a any nccl packages with examples? GPU-Accelerated Libraries	6	95	October 22, 2025

Availability of NCCL TLS from UCC in hpcx-mpi/2.22.1 from nvhpc/25.3

Related topics