Cuda 12.4 Driver Version: 565.57.0

System is Rocky 8
(ai-simulation-root) [root@localhost cuda-12.4]# systemctl status nvidia-fabricmanager
● nvidia-fabricmanager.service - NVIDIA fabric manager service
Loaded: loaded (/usr/lib/systemd/system/nvidia-fabricmanager.service; enabled; vendor preset: disabled)
Active: active (running) since Mon 2024-11-18 22:03:31 EST; 2h 19min ago
Main PID: 70976 (nv-fabricmanage)
Tasks: 18 (limit: 3355442)
Memory: 11.7M
CGroup: /system.slice/nvidia-fabricmanager.service
└─70976 /usr/bin/nv-fabricmanager -c /usr/share/nvidia/nvswitch/fabricmanager.cfg

Nov 18 22:03:30 localhost.localdomain systemd[1]: Starting NVIDIA fabric manager service…
Nov 18 22:03:31 localhost.localdomain nv-fabricmanager[70976]: Connected to 1 node.
Nov 18 22:03:31 localhost.localdomain nv-fabricmanager[70976]: Successfully configured all the available NVSwitches to route GPU NVLink traffic. NVLink P>
Nov 18 22:03:31 localhost.localdomain systemd[1]: Started NVIDIA fabric manager service.

(ai-simulation-root) [root@localhost cuda-12.4]# nvidia-smi
Tue Nov 19 00:24:38 2024
±----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01 Driver Version: 565.57.01 CUDA Version: 12.7 |
|-----------------------------------------±-----------------------±---------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA H20 On | 00000000:18:00.0 Off | 0 |
| N/A 35C P0 74W / 500W | 1MiB / 97871MiB | 0% Default |
| | | Disabled |
±----------------------------------------±-----------------------±---------------------+
| 1 NVIDIA H20 On | 00000000:38:00.0 Off | 0 |
| N/A 32C P0 73W / 500W | 1MiB / 97871MiB | 0% Default |
| | | Disabled |
±----------------------------------------±-----------------------±---------------------+
| 2 NVIDIA H20 On | 00000000:49:00.0 Off | 0 |
| N/A 36C P0 74W / 500W | 1MiB / 97871MiB | 0% Default |
| | | Disabled |
±----------------------------------------±-----------------------±---------------------+
| 3 NVIDIA H20 On | 00000000:59:00.0 Off | 0 |
| N/A 31C P0 72W / 500W | 1MiB / 97871MiB | 0% Default |
| | | Disabled |
±----------------------------------------±-----------------------±---------------------+
| 4 NVIDIA H20 On | 00000000:9B:00.0 Off | 0 |
| N/A 32C P0 75W / 500W | 1MiB / 97871MiB | 0% Default |
| | | Disabled |
±----------------------------------------±-----------------------±---------------------+
| 5 NVIDIA H20 On | 00000000:BB:00.0 Off | 0 |
| N/A 35C P0 75W / 500W | 1MiB / 97871MiB | 0% Default |
| | | Disabled |
±----------------------------------------±-----------------------±---------------------+
| 6 NVIDIA H20 On | 00000000:CA:00.0 Off | 0 |
| N/A 34C P0 74W / 500W | 1MiB / 97871MiB | 0% Default |
| | | Disabled |
±----------------------------------------±-----------------------±---------------------+
| 7 NVIDIA H20 On | 00000000:DA:00.0 Off | 0 |
| N/A 36C P0 76W / 500W | 1MiB / 97871MiB | 0% Default |
| | | Disabled |
±----------------------------------------±-----------------------±---------------------+

±----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
±----------------------------------------------------------------------------------------+
(ai-simulation-root) [root@localhost cuda-12.4]# nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:18:24_PDT_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0

(ai-simulation-root) [root@localhost cuda-12.4]# pip list
Package Version


Brotli 1.0.9
certifi 2024.8.30
charset-normalizer 3.3.2
filelock 3.13.1
gmpy2 2.1.2
idna 3.7
Jinja2 3.1.4
MarkupSafe 2.1.3
mkl_fft 1.3.11
mkl_random 1.2.8
mkl-service 2.4.0
mpmath 1.3.0
networkx 3.3
numpy 1.26.4
pillow 11.0.0
pip 24.2
PySocks 1.7.1
PyYAML 6.0.2
requests 2.32.3
setuptools 75.1.0
sympy 1.13.2
torch 2.5.1
torchaudio 2.5.1
torchvision 0.20.1
triton 3.1.0
typing_extensions 4.11.0
urllib3 2.2.3
wheel 0.44.0
Error is
(ai-simulation-root) [root@localhost cuda-12.4]# python
Python 3.10.15 (main, Oct 3 2024, 07:27:34) [GCC 11.2.0] on linux
Type “help”, “copyright”, “credits” or “license” for more information.

import torch
torch.cuda.is_available()
/root/.conda/envs/ai-simulation-root/lib/python3.10/site-packages/torch/cuda/init.py:129: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 802: system not yet initialized (Triggered internally at /opt/conda/conda-bld/pytorch_1729647382455/work/c10/cuda/CUDAFunctions.cpp:108.)
return torch._C._cuda_getDeviceCount() > 0
False

Hi, i met the same issue, and have you solved the problem yet?