CUDA NCCL Error "operation not supported" Multi-GPUs

Hi Forums,

Setup:

  • GPU: two M4000 GPU
  • CUDA Version: cuda_12.4.r12.4/compiler.34097967_0
  • NCCL Version: libnccl-dev 2.27.3-1+cuda12.4

GPUs are independently via PCIe on my motherboard, no NVLINK between them.

I tried to train a PyTorch model using both GPU, using nn.DataParallel()
However, I ran into the error 'unhandled cuda error (run with NCCL_DEBUG=INFO for details)

Running nccl-tests./build/all_reduce_perf with NCCL_DEBUG=INFO, I got this error

Authorization required, but no authorization protocol specified
# nThread 1 nGpus 1 minBytes 8 maxBytes 536870912 step: 2(factor) warmup iters: 5 iters: 20 agg iters: 1 validation: 1 graph: 0
#
# Using devices
#  Rank  0 Group  0 Pid 204118 on hom device  0 [0000:15:00] Quadro M4000
home:204118:204118 [0] NCCL INFO Bootstrap: Using eno1:10.39.120.16<0>
home:204118:204118 [0] NCCL INFO cudaDriverVersion 12040
home:204118:204118 [0] NCCL INFO NCCL version 2.27.3+cuda12.4
home:204118:204136 [0] NCCL INFO NET/Plugin: Could not find: libnccl-net.so. 
home:204118:204136 [0] NCCL INFO NET/IB : No device found.
home:204118:204136 [0] NCCL INFO NET/IB : Using [RO]; OOB eno1:10.39.120.16<0>
home:204118:204136 [0] NCCL INFO NET/Socket : Using [0]eno1:10.39.120.16<0>
home:204118:204136 [0] NCCL INFO Initialized NET plugin Socket
home:204118:204136 [0] NCCL INFO Assigned NET plugin Socket to comm
home:204118:204136 [0] NCCL INFO Using network Socket

home:204118:204136 [0] init.cc:426 NCCL WARN Cuda failure 'operation not supported'

I see NET/IB : No device found. Does it mean NCCL can’t find my 2 GPUs? smi can find both GPU no problem.
Thanks!

I suspect the message is referring to an Infiniband network interface, which you presumably don’t have fitted, hence the “INFO” status.

A wild guess: Looking at common.mk in the nccl-tests, the minimum hardware version supported is Pascal, (sm_60), and this is perhaps causing the “operation not supported”.

You are on Maxwell, sm_52, so try adding an entry for -gencode=arch=compute_52,code=sm_52