Nccl version missmatch causes multi-gpu training freeze

doyikim34 · February 11, 2022, 6:36pm

hi I’m using cuda 11.3 in my docker container. if I run multi-gpus it freezes so I thought it would be solved if I change pytorch.cuda.nccl.version…

I really like to know where the nccl 2.10.3 is located and how can I remove it.

also is there any way to find nccl 2.10.3 in my env? because apt search nccl didn’t show any 2.10.3 version that shows in torch.cuda.nccl.version. I wonder if I remove 2.10.3, then torch would set the default version as 2.9.9.

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Mon_May__3_19:15:13_PDT_2021
Cuda compilation tools, release 11.3, V11.3.109
Build cuda_11.3.r11.3/compiler.29920130_0

Python 3.8.8 (default, Apr 13 2021, 19:58:26) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'1.10.2+cu113'
>>> torch.cuda.nccl.version()
(2, 10, 3)

libhttpasyncclient-java/focal 4.1.4-1 all
  HTTP/1.1 compliant asynchronous HTTP agent implementation

libnccl-dev/unknown 2.11.4-1+cuda11.6 amd64 [upgradable from: 2.9.9-1+cuda11.3]
  NVIDIA Collective Communication Library (NCCL) Development Files

libnccl2/unknown 2.11.4-1+cuda11.6 amd64 [upgradable from: 2.9.9-1+cuda11.3]
  NVIDIA Collective Communication Library (NCCL) Runtime

libpuppetlabs-http-client-clojure/focal 0.9.0-1 all
  Clojure wrapper around libhttpasyncclient-java

libvncclient1/focal-updates,focal-security 0.9.12+dfsg-9ubuntu0.3 amd64
  API to write one's own VNC server - client library

python-ncclient-doc/focal 0.6.0-2.1 all
  Documentation for python-ncclient (Python library for NETCONF clients)

python3-ncclient/focal 0.6.0-2.1 all
  Python library for NETCONF clients (Python 3)

thanks

Topic		Replies	Views
What's the relationship between cuda_toolkit and pytorch CUDA Setup and Installation cuda , ubuntu , pytorch , wsl	3	11019	August 16, 2023
NCCL Version GPU-Accelerated Libraries	1	850	July 18, 2022
Installing CUDA 11.3 end up with 11.6 and inconsistent configuration! CUDA Setup and Installation	0	631	April 13, 2023
Cannot find NCCL libnccl-net.so file CUDA Setup and Installation cuda	0	1160	November 9, 2023
[Solved] CUDA driver initialization failed - 2x RTX 5090 CUDA Setup and Installation cuda , pytorch	3	246	May 28, 2025
OpenACC, CUDA, driver mismatch issue nvc, nvc++ and nvfortran	3	1044	May 9, 2023
Detected that PyTorch and torchvision were compiled with different CUDA versions CUDA Setup and Installation	2	4545	September 26, 2023
Failed installation of CUDA - nvcc won't work and can't set PATH CUDA Setup and Installation cuda , ubuntu , python	1	585	November 23, 2023
NCCL declaring Nvidia GPU missing using Pytorch distributed GPU-Accelerated Libraries boot , cuda , ubuntu , nvbugs	1	3613	February 7, 2023
Cuda Version - Conda CUDA Setup and Installation	2	5743	April 3, 2024

Nccl version missmatch causes multi-gpu training freeze

Related topics