Hi All,
I recently got a RTX 4000 Ada installed on a Rocky Linux 9.5 server.
The system sees the GPU and driver installation seems fine,
[root@jupyter2 ~]# nvidia-smi
Tue Oct 21 12:19:09 2025
±----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05 Driver Version: 580.95.05 CUDA Version: 13.0 |
±----------------------------------------±-----------------------±---------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA RTX 4000 SFF Ada … On | 00000000:41:00.0 Off | Off |
| 30% 36C P8 6W / 70W | 1MiB / 20475MiB | 0% Default |
| | | N/A |
±----------------------------------------±-----------------------±---------------------+
±----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
±----------------------------------------------------------------------------------------+
But I’m having issues getting the GPU recognized when running simple tests.
Using the CUDA sample binaries fails
./deviceQuery
./deviceQuery Starting…
CUDA Device Query (Runtime API) version (CUDART static linking)
cudaGetDeviceCount returned 3
→ initialization error
Result = FAIL
Pytorch also fails to recognize the GPU. This is torch installed with pip in a conda environment
python -c “import torch; print(torch.version.cuda); print(torch.cuda.is_available())”
13.0
/home/henryz/miniconda3/envs/torch/lib/python3.11/site-packages/torch/cuda/init.py:182: UserWarning: CUDA initialization: CUDA driver initialization failed, you might not have a CUDA gpu. (Triggered internally at /pytorch/c10/cuda/CUDAFunctions.cpp:119.)
return torch._C._cuda_getDeviceCount() > 0
False
Any ideas on how to proceed?