I have just pulled the latest PyTorch container ‘pytorch:19.10-py3’ from: https://ngc.nvidia.com/catalog/containers/nvidia:pytorch. However when I connect to the container I get this error:
=============
== PyTorch ==
=============
NVIDIA Release 19.10 (build 8472689)
PyTorch Version 1.3.0a0+24ae9b5
Container image Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
Copyright (c) 2014-2019 Facebook Inc.
Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)
Copyright (c) 2012-2014 Deepmind Technologies (Koray Kavukcuoglu)
Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)
Copyright (c) 2011-2013 NYU (Clement Farabet)
Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)
Copyright (c) 2006 Idiap Research Institute (Samy Bengio)
Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)
Copyright (c) 2015 Google Inc.
Copyright (c) 2015 Yangqing Jia
Copyright (c) 2013-2016 The Caffe contributors
All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION. All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying project or file.
ERROR: This container was built for NVIDIA Driver Release 418.87 or later, but
version 410.66 was detected and compatibility mode is UNAVAILABLE.
[[CUDA Driver UNAVAILABLE (cuInit(0) returned 804)]]
NOTE: MOFED driver for multi-node communication was not detected.
Multi-node communication performance may be reduced.
nvcc --version shows this:
root@b7275b4dd0fd:/workspace# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
and nvidia-smi shows this:
root@b7275b4dd0fd:/workspace# nvidia-smi
Wed Oct 30 20:50:18 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.66 Driver Version: 410.66 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:0A:00.0 Off | N/A |
| 25% 45C P8 18W / 250W | 0MiB / 11177MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 108... Off | 00000000:0B:00.0 Off | N/A |
| 23% 29C P8 10W / 250W | 0MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Is it possible that the container was built with the wrong driver version?
I have tried restarting the container, even the host and the error remains.
Thanks