ERROR: The NVIDIA Driver is present, but CUDA failed to initialize

Description

Pytorch
NVIDIA Release 23.12 (build 76438008)
PyTorch Version 2.2.0a0+81ea7a4

ERROR: The NVIDIA Driver is present, but CUDA failed to initialize. GPU functionality will not be available.
[[ No CUDA-capable device is detected (error 100) ]]

Failed to detect NVIDIA driver version.

Environment

I am using Azure VM
Image: Nvidia-GPU optimized VMI with Vgpu driver - v22.08.00 -x64 gen 2
Size: Standard_NC24ads_A100_v4 - 24vCPU 220 memory
GPU Type: A100
Nvidia Driver Version: nvidia-smi is not working for some reason

NVRM version: NVIDIA UNIX x86_64 Kernel Module 510.73.08 Wed May 18 20:34:14 UTC 2022
GCC version: gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
CUDA Version: Cuda compilation tools, release 10.1, V10.1.243
Operating System + Version: Ubuntu 20.04
Python Version (if applicable): 3.8.10

My intention is to run nvidia tensorrt llm
Uploading: Screenshot 2024-02-22 161060.png…


Hi @shivam.mehta ,
can you try a reboot and check?

Already tried couple of times, also recreated the VM just to be sure, still no progress