Hello,
I am trying to use Cuda on my Linux server and all it fails to run.
OS: RHEL 7
Graphics card: Nvidia Tesla V100
Cuda version: 10.0
GCC version: 4.8.5
Libcudnn: 7.5.0
Have tried Cuda 9.0, 9.1 and 10.0 (currently installed)
My Pyhton code with Tensorflow-GPU library does not see any devices:
from tensorflow.python.client import device_lib
from tensorflow import __version__
print(__version__)
device_lib.list_local_devices()
1.13.1
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 1272378539820921425, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 10994030388666212038
physical_device_desc: "device: XLA_CPU device"]
When I tried to run Cuda samples and they all fail.
$ ./deviceQuery
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
cudaGetDeviceCount returned 38
-> no CUDA-capable device is detected
Result = FAIL
Here is some hardware\software information:
$ lspci | grep -i nvidia
3b:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 PCIe] (rev a1)
d8:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 PCIe] (rev a1)
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
$ yum list --installroot /opt installed | grep nvidia
dkms-nvidia.x86_64 3:418.40.04-1.el7
nvidia-diag-driver-local-repo-rhel7-410.104.x86_64
nvidia-driver.x86_64 3:418.40.04-4.el7
nvidia-driver-NVML.x86_64 3:418.40.04-4.el7
nvidia-driver-NvFBCOpenGL.x86_64 3:418.40.04-4.el7
nvidia-driver-cuda.x86_64 3:418.40.04-4.el7
nvidia-driver-cuda-libs.x86_64 3:418.40.04-4.el7
nvidia-driver-devel.x86_64 3:418.40.04-4.el7
nvidia-driver-libs.x86_64 3:418.40.04-4.el7
nvidia-libXNVCtrl.x86_64 3:418.40.04-1.el7
nvidia-libXNVCtrl-devel.x86_64 3:418.40.04-1.el7
nvidia-modprobe.x86_64 3:418.40.04-1.el7
nvidia-persistenced.x86_64 3:418.40.04-1.el7
nvidia-settings.x86_64 3:418.40.04-1.el7
nvidia-xconfig.x86_64 3:418.40.04-1.el7
pcp-pmda-nvidia-gpu.x86_64 3.12.2-5.el7
$ yum list --installroot /opt installed | grep cuda
cuda-10-0.x86_64 10.0.130-1
cuda-command-line-tools-10-0.x86_64
cuda-compiler-10-0.x86_64 10.0.130-1
cuda-cublas-10-0.x86_64 10.0.130-1
cuda-cublas-dev-10-0.x86_64 10.0.130-1
cuda-cudart-10-0.x86_64 10.0.130-1
cuda-cudart-dev-10-0.x86_64 10.0.130-1
cuda-cufft-10-0.x86_64 10.0.130-1
cuda-cufft-dev-10-0.x86_64 10.0.130-1
cuda-cuobjdump-10-0.x86_64 10.0.130-1
cuda-cupti-10-0.x86_64 10.0.130-1
cuda-curand-10-0.x86_64 10.0.130-1
cuda-curand-dev-10-0.x86_64 10.0.130-1
cuda-cusolver-10-0.x86_64 10.0.130-1
cuda-cusolver-dev-10-0.x86_64 10.0.130-1
cuda-cusparse-10-0.x86_64 10.0.130-1
cuda-cusparse-dev-10-0.x86_64 10.0.130-1
cuda-demo-suite-10-0.x86_64 10.0.130-1
cuda-documentation-10-0.x86_64 10.0.130-1
cuda-driver-dev-10-0.x86_64 10.0.130-1
cuda-drivers.x86_64 418.40.04-1
cuda-gdb-10-0.x86_64 10.0.130-1
cuda-gpu-library-advisor-10-0.x86_64
cuda-libraries-10-0.x86_64 10.0.130-1
cuda-libraries-dev-10-0.x86_64 10.0.130-1
cuda-license-10-0.x86_64 10.0.130-1
cuda-memcheck-10-0.x86_64 10.0.130-1
cuda-misc-headers-10-0.x86_64 10.0.130-1
cuda-npp-10-0.x86_64 10.0.130-1
cuda-npp-dev-10-0.x86_64 10.0.130-1
cuda-nsight-10-0.x86_64 10.0.130-1
cuda-nsight-compute-10-0.x86_64 10.0.130-1
cuda-nvcc-10-0.x86_64 10.0.130-1
cuda-nvdisasm-10-0.x86_64 10.0.130-1
cuda-nvgraph-10-0.x86_64 10.0.130-1
cuda-nvgraph-dev-10-0.x86_64 10.0.130-1
cuda-nvjpeg-10-0.x86_64 10.0.130-1
cuda-nvjpeg-dev-10-0.x86_64 10.0.130-1
cuda-nvml-dev-10-0.x86_64 10.0.130-1
cuda-nvprof-10-0.x86_64 10.0.130-1
cuda-nvprune-10-0.x86_64 10.0.130-1
cuda-nvrtc-10-0.x86_64 10.0.130-1
cuda-nvrtc-dev-10-0.x86_64 10.0.130-1
cuda-nvtx-10-0.x86_64 10.0.130-1
cuda-nvvp-10-0.x86_64 10.0.130-1
cuda-runtime-10-0.x86_64 10.0.130-1
cuda-samples-10-0.x86_64 10.0.130-1
cuda-toolkit-10-0.x86_64 10.0.130-1
cuda-tools-10-0.x86_64 10.0.130-1
cuda-visual-tools-10-0.x86_64 10.0.130-1
libcudnn7.x86_64 7.5.0.56-1.cuda10.0
libcudnn7-devel.x86_64 7.5.0.56-1.cuda10.0
nvidia-driver-cuda.x86_64 3:418.40.04-4.el7
nvidia-driver-cuda-libs.x86_64 3:418.40.04-4.el7
I installed (reinstalled) cuda using
$ sudo yum install --installroot /opt cuda-10-0
added /opt/usr/local/cuda-10.0 to PATH and LD_LIBRARY_PATH
did a reboot after installation
(removed previous versions by yum remove --installroot /opt )
Unfortunately, system fresh reinstallation is not an option for me.
Any help would be appreciated. Thanks in advance!