Unable to install nvidia-sim on A100 GPU on Ubuntu 22.04

Not able to install Nvidia Display drivers on A100 GPU on Ubuntu 22.04 OS. We followed the steps mentioned in “https://linuxconfig.org/how-to-install-the-nvidia-drivers-on-ubuntu-22-04”. However A100 GPUs visible on nvidia-smi not visible for Pytorch.

Hello and welcome to the NVIDIA developer forums @pkmkicha.

Can you share a bit more details please?

For one thing, how did you determine that PyTorch does not work?

You need to follow the PyTorch instructions on which package to install and which CUDA version you need to make things work. It is not enough to only install the NVIDIA driver.

Also, I would recommend looking for the official Ubuntu instructions regarding driver isntallation or study the very detailed README by NVIDIA themselves.

Thanks!

Thanks for the reply. Below are the details of issues that we are facing.

  1. What GPU models / qty is installed and what is the HPE server model.
  • NVidia GA100 GPU
  • Dell Server - PowerEdge R740xd
  1. Are you using this bare metal with Ubuntu OS or you are using virtualization / hypervisors?
  • VMware with Ubuntu 22.04 OS
  1. What is the use case – video analytics, AI training, inferencing etc?
  • AI model Training (Medical Images – 3D data, DICOM volume) using PyTorch and MONAI framework (MONAI - Home)
  • AI Model inference
  1. Technical issues – could you please specify little bit?
  • The system able to detect A100 GPUs but nvidia-smi is not visible for Pytorch.

It says: NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running

Thank you for the details!

The topic of GPU virtualization is not always completely clear. You might need to use our vGPU offerings to support compute GPUs like the A100, depending on your setup.

But just recently there was another developer who managed to get things up and running with Hyper-V and A100.

In general, the message you quote

NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running

indicates that the GPU was NOT correctly recognized or the driver not correctly installed.