Hi I’m trying to use my GPU rtx 3060 and comply with cuda 11.7 asking me for an nvidia driver with version 470 . It seems I’m at conflicting installation and backends like NCCL won’t push to cuda unless I resolve this .
On one end I have my Ubunutu 20 Updater interface telling me under “additional drivers” that I am
“Using NVIDIA Server Driver metapackage from nvidia-driver-470-server”
(i’m using a normal workstation not a server by the way so it defaulting to this is perplexing on it’s own).
On the other the bash command
(uname -r)/kernel/drivers/video/nvidia.ko | grep ^version
returns
bash: 5.14.0-1057-oem/kernel/drivers/video/nvidia.ko: No such file or directory
and on the side command nvidia-smi
returns
Failed to initialize NVML: Driver/library version mismatch
and finally the cherry on top is that the site points only to the newest version of 525 whereas cuda 11.7 (the latest supported version supported by torch.distributed) only requires a driver of version 470 . Linux x64 (AMD64/EM64T) Display Driver | 525.85.05 | Linux 64-bit | NVIDIA .
so If anyone with an idea how to resolve this could help . I just need a driver that works on this OS with cuda11.7 and NCCL support .
in case we make progress here , u can look at more details on my other post : NCCL declaring Nvidia GPU missing using Pytorch distributed