I have a GPU server having Arch Linux installed on it:
Linux 5.14.16-arch1-1 #1 SMP PREEMPT Tue, 02 Nov 2021 22:22:59 +0000 x86_64 GNU/Linux
having two NVIDIA GeForce RTX 3080 GPU cards installed on it:
$ lspci -k | grep -A 2 -E "(VGA|3D)"
lspci: Unable to load libkmod resources: error -2
00:02.0 VGA compatible controller: Intel Corporation CometLake-S GT2 [UHD Graphics 630] (rev 05)
DeviceName: Onboard - Video
Subsystem: Micro-Star International Co., Ltd. [MSI] Device 7c79
--
01:00.0 VGA compatible controller: NVIDIA Corporation GA102 [GeForce RTX 3080] (rev a1)
Subsystem: Micro-Star International Co., Ltd. [MSI] RTX 3080 10GB GAMING X TRIO
Kernel driver in use: nvidia
--
03:00.0 VGA compatible controller: NVIDIA Corporation GA102 [GeForce RTX 3080] (rev a1)
Subsystem: Micro-Star International Co., Ltd. [MSI] RTX 3080 10GB GAMING X TRIO
Kernel driver in use: nvidia
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0
However, it seems that they are not functioning at all:
$ nvidia-smi
Sun Dec 12 22:29:29 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.44 Driver Version: 495.44 CUDA Version: 11.5 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A |
| 30% 46C P0 96W / 340W | 0MiB / 10018MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... Off | 00000000:03:00.0 Off | N/A |
| 30% 37C P0 N/A / 340W | 0MiB / 10018MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Consequently, when I am trying to utilize them using TensorFlow2 with the following code,
# Check GPU availibility-
gpu_devices = tf.config.list_physical_devices('GPU')
# print(f"GPU: {gpu_devices}")
if gpu_devices:
print(f"GPU: {gpu_devices}")
details = tf.config.experimental.get_device_details(gpu_devices[0])
print(f"GPU details: {details.get('device_name', 'Unknown GPU')}")
else:
print("No GPU found")
It outputs: “No GPU found”.
What’s going wrong?