I have tried to run neural network models on the 3090Ti GPUs but both PyTorch and TensorFlow cannot find my Cuda devices even though the devices and corresponding drivers are correctly installed. My system setup is Ubuntu 22.04.1 LTS and my ubuntu kernel version is 5.15.0-56-generic. Below are the outputs from PyTorch and TensorFlow. I have tried the PyTorch version from 1.6 to 1.13 and the python version from 3.7 to 3.10. Thanks a lot for your help!
For Pytorch Environment Test, I type the command python -m torch.utils.collect_env
Collecting environment information...
PyTorch version: 1.13.1
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.1 LTS (x86_64)
GCC version: (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
Clang version: Could not collect
CMake version: version 3.22.1
Libc version: glibc-2.35
Python version: 3.10.8 (main, Nov 24 2022, 14:13:03) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.0-56-generic-x86_64-with-glibc2.35
Is CUDA available: False
CUDA runtime version: 11.7.99
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration:
GPU 0: NVIDIA GeForce RTX 3090 Ti
GPU 1: NVIDIA GeForce RTX 3090 Ti
Nvidia driver version: 525.60.11
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] numpy==1.23.4
[pip3] torch==1.13.1
[pip3] torchaudio==0.13.1
[pip3] torchvision==0.14.1
[conda] blas 1.0 mkl
[conda] ffmpeg 4.3 hf484d3e_0 pytorch
[conda] mkl 2021.4.0 h06a4308_640
[conda] mkl-service 2.4.0 py310h7f8727e_0
[conda] mkl_fft 1.3.1 py310hd6ae3a3_0
[conda] mkl_random 1.2.2 py310h00e6091_0
[conda] numpy 1.23.4 py310hd5efca6_0
[conda] numpy-base 1.23.4 py310h8e6c178_0
[conda] pytorch 1.13.1 py3.10_cuda11.7_cudnn8.5.0_0 pytorch
[conda] pytorch-cuda 11.7 h67b0de4_1 pytorch
[conda] pytorch-mutex 1.0 cuda pytorch
[conda] torchaudio 0.13.1 py310_cu117 pytorch
[conda] torchvision 0.14.1 py310_cu117 pytorch
For Tensorflow CUDA check, I type the command python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
2022-12-27 12:12:02.205401: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-12-27 12:12:02.270790: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2022-12-27 12:12:02.869572: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:267] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2022-12-27 12:12:02.869596: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: sijiawang-MS-7D30
2022-12-27 12:12:02.869599: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: sijiawang-MS-7D30
2022-12-27 12:12:02.869648: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: NOT_FOUND: was unable to find libcuda.so DSO loaded into this program
2022-12-27 12:12:02.869664: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 525.60.11