hi I’m using cuda 11.3 in my docker container. if I run multi-gpus it freezes so I thought it would be solved if I change pytorch.cuda.nccl.version…
I really like to know where the nccl 2.10.3 is located and how can I remove it.
also is there any way to find nccl 2.10.3 in my env? because apt search nccl didn’t show any 2.10.3 version that shows in torch.cuda.nccl.version. I wonder if I remove 2.10.3, then torch would set the default version as 2.9.9.
nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2021 NVIDIA Corporation Built on Mon_May__3_19:15:13_PDT_2021 Cuda compilation tools, release 11.3, V11.3.109 Build cuda_11.3.r11.3/compiler.29920130_0
Python 3.8.8 (default, Apr 13 2021, 19:58:26) [GCC 7.3.0] :: Anaconda, Inc. on linux Type "help", "copyright", "credits" or "license" for more information. >>> import torch >>> torch.__version__ '1.10.2+cu113' >>> torch.cuda.nccl.version() (2, 10, 3)
libhttpasyncclient-java/focal 4.1.4-1 all HTTP/1.1 compliant asynchronous HTTP agent implementation libnccl-dev/unknown 2.11.4-1+cuda11.6 amd64 [upgradable from: 2.9.9-1+cuda11.3] NVIDIA Collective Communication Library (NCCL) Development Files libnccl2/unknown 2.11.4-1+cuda11.6 amd64 [upgradable from: 2.9.9-1+cuda11.3] NVIDIA Collective Communication Library (NCCL) Runtime libpuppetlabs-http-client-clojure/focal 0.9.0-1 all Clojure wrapper around libhttpasyncclient-java libvncclient1/focal-updates,focal-security 0.9.12+dfsg-9ubuntu0.3 amd64 API to write one's own VNC server - client library python-ncclient-doc/focal 0.6.0-2.1 all Documentation for python-ncclient (Python library for NETCONF clients) python3-ncclient/focal 0.6.0-2.1 all Python library for NETCONF clients (Python 3)