Pytorch is not detecting GPU

Hi All,

I was trying to use PyTorch with GPU in one VM installed with Ubuntu 18.04.

GPU is displaying nvidia-smi.

Thu Aug 4 23:12:19 2022
±----------------------------------------------------------------------------+
| NVIDIA-SMI 470.63.01 Driver Version: 470.63.01 CUDA Version: 11.4 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GRID RTX8000P-8Q Off | 00000000:02:00.0 Off | N/A |
| N/A N/A P8 N/A / N/A | 550MiB / 8192MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+

Installed following packages as well

conda list | grep -i cuda
cudatoolkit 11.3.1 h2bc3f7f_2
cudnn 8.2.1 cuda11.3_0
pytorch 1.12.0 py3.7_cuda11.3_cudnn8.3.2_0 pytorch
pytorch-mutex 1.0 cuda pytorch

I am getting following error while running script.

UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352464346/work/c10/cuda/CUDAFunctions.cpp:109.)
return torch._C._cuda_getDeviceCount() > 0
Traceback (most recent call last):
File “test.py”, line 4, in
torch.cuda.get_device_name(0)
File “/home/ubuntu/miniconda3/lib/python3.7/site-packages/torch/cuda/init.py”, line 329, in get_device_name
return get_device_properties(device).name
File “/home/ubuntu/miniconda3/lib/python3.7/site-packages/torch/cuda/init.py”, line 359, in get_device_properties
_lazy_init() # will define _get_device_properties
File “/home/ubuntu/miniconda3/lib/python3.7/site-packages/torch/cuda/init.py”, line 217, in _lazy_init
torch._C._cuda_init()
RuntimeError: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero.

I have ran nvidia-bug-report.sh script as well but could interprept much from that.
Here I am attahced the result of the script.

nvidia-bug-report.log (277.8 KB)

Could any one please help to debug this ?