CUDA 10.0 - no CUDA-capable device is detected, nvidia-smi does not work.

Hello,
I am trying to use Cuda on my Linux server and all it fails to run.
OS: RHEL 7
Graphics card: Nvidia Tesla V100
Cuda version: 10.0
GCC version: 4.8.5
Libcudnn: 7.5.0
Have tried Cuda 9.0, 9.1 and 10.0 (currently installed)
My Pyhton code with Tensorflow-GPU library does not see any devices:

from tensorflow.python.client import device_lib
from tensorflow import __version__
print(__version__)
device_lib.list_local_devices()

1.13.1
[name: "/device:CPU:0"
 device_type: "CPU"
 memory_limit: 268435456
 locality {
 }
 incarnation: 1272378539820921425, name: "/device:XLA_CPU:0"
 device_type: "XLA_CPU"
 memory_limit: 17179869184
 locality {
 }
 incarnation: 10994030388666212038
 physical_device_desc: "device: XLA_CPU device"]

When I tried to run Cuda samples and they all fail.

$ ./deviceQuery
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 38
-> no CUDA-capable device is detected
Result = FAIL

Here is some hardware\software information:

$ lspci | grep -i nvidia
3b:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 PCIe] (rev a1)
d8:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 PCIe] (rev a1)
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
$ yum list --installroot /opt installed | grep nvidia
dkms-nvidia.x86_64               3:418.40.04-1.el7           
nvidia-diag-driver-local-repo-rhel7-410.104.x86_64
nvidia-driver.x86_64             3:418.40.04-4.el7          
nvidia-driver-NVML.x86_64        3:418.40.04-4.el7          
nvidia-driver-NvFBCOpenGL.x86_64 3:418.40.04-4.el7          
nvidia-driver-cuda.x86_64        3:418.40.04-4.el7          
nvidia-driver-cuda-libs.x86_64   3:418.40.04-4.el7          
nvidia-driver-devel.x86_64       3:418.40.04-4.el7          
nvidia-driver-libs.x86_64        3:418.40.04-4.el7          
nvidia-libXNVCtrl.x86_64         3:418.40.04-1.el7          
nvidia-libXNVCtrl-devel.x86_64   3:418.40.04-1.el7          
nvidia-modprobe.x86_64           3:418.40.04-1.el7          
nvidia-persistenced.x86_64       3:418.40.04-1.el7          
nvidia-settings.x86_64           3:418.40.04-1.el7          
nvidia-xconfig.x86_64            3:418.40.04-1.el7          
pcp-pmda-nvidia-gpu.x86_64       3.12.2-5.el7
$ yum list --installroot /opt installed | grep cuda
cuda-10-0.x86_64                 10.0.130-1                 
cuda-command-line-tools-10-0.x86_64
cuda-compiler-10-0.x86_64        10.0.130-1                 
cuda-cublas-10-0.x86_64          10.0.130-1                 
cuda-cublas-dev-10-0.x86_64      10.0.130-1                 
cuda-cudart-10-0.x86_64          10.0.130-1                 
cuda-cudart-dev-10-0.x86_64      10.0.130-1                 
cuda-cufft-10-0.x86_64           10.0.130-1                 
cuda-cufft-dev-10-0.x86_64       10.0.130-1                 
cuda-cuobjdump-10-0.x86_64       10.0.130-1                 
cuda-cupti-10-0.x86_64           10.0.130-1                 
cuda-curand-10-0.x86_64          10.0.130-1                 
cuda-curand-dev-10-0.x86_64      10.0.130-1                 
cuda-cusolver-10-0.x86_64        10.0.130-1                 
cuda-cusolver-dev-10-0.x86_64    10.0.130-1                 
cuda-cusparse-10-0.x86_64        10.0.130-1                 
cuda-cusparse-dev-10-0.x86_64    10.0.130-1                 
cuda-demo-suite-10-0.x86_64      10.0.130-1                 
cuda-documentation-10-0.x86_64   10.0.130-1                 
cuda-driver-dev-10-0.x86_64      10.0.130-1                 
cuda-drivers.x86_64              418.40.04-1                
cuda-gdb-10-0.x86_64             10.0.130-1                 
cuda-gpu-library-advisor-10-0.x86_64
cuda-libraries-10-0.x86_64       10.0.130-1                 
cuda-libraries-dev-10-0.x86_64   10.0.130-1                 
cuda-license-10-0.x86_64         10.0.130-1                 
cuda-memcheck-10-0.x86_64        10.0.130-1                 
cuda-misc-headers-10-0.x86_64    10.0.130-1                 
cuda-npp-10-0.x86_64             10.0.130-1                 
cuda-npp-dev-10-0.x86_64         10.0.130-1                 
cuda-nsight-10-0.x86_64          10.0.130-1                 
cuda-nsight-compute-10-0.x86_64  10.0.130-1                 
cuda-nvcc-10-0.x86_64            10.0.130-1                 
cuda-nvdisasm-10-0.x86_64        10.0.130-1                 
cuda-nvgraph-10-0.x86_64         10.0.130-1                 
cuda-nvgraph-dev-10-0.x86_64     10.0.130-1                 
cuda-nvjpeg-10-0.x86_64          10.0.130-1                 
cuda-nvjpeg-dev-10-0.x86_64      10.0.130-1                 
cuda-nvml-dev-10-0.x86_64        10.0.130-1                 
cuda-nvprof-10-0.x86_64          10.0.130-1                 
cuda-nvprune-10-0.x86_64         10.0.130-1                 
cuda-nvrtc-10-0.x86_64           10.0.130-1                 
cuda-nvrtc-dev-10-0.x86_64       10.0.130-1                 
cuda-nvtx-10-0.x86_64            10.0.130-1                 
cuda-nvvp-10-0.x86_64            10.0.130-1                 
cuda-runtime-10-0.x86_64         10.0.130-1                 
cuda-samples-10-0.x86_64         10.0.130-1                 
cuda-toolkit-10-0.x86_64         10.0.130-1                 
cuda-tools-10-0.x86_64           10.0.130-1                 
cuda-visual-tools-10-0.x86_64    10.0.130-1                 
libcudnn7.x86_64                 7.5.0.56-1.cuda10.0         
libcudnn7-devel.x86_64           7.5.0.56-1.cuda10.0         
nvidia-driver-cuda.x86_64        3:418.40.04-4.el7          
nvidia-driver-cuda-libs.x86_64   3:418.40.04-4.el7

I installed (reinstalled) cuda using

$ sudo yum install --installroot /opt cuda-10-0

added /opt/usr/local/cuda-10.0 to PATH and LD_LIBRARY_PATH
did a reboot after installation

(removed previous versions by yum remove --installroot /opt )

Unfortunately, system fresh reinstallation is not an option for me.
Any help would be appreciated. Thanks in advance!

1 Like