Hey,
I am currently trying to setup a workstation running Ubuntu 22.04LTS with an A100 GPU for machine learning tasks.
The workstation is not intended to be used in a headless mode, i.e., we want to have a screen attached using the intel onboard gpu for the display.
After trying various combinations of drivers and guides I have not managed to get a working system. I.e., I cant manage to get around
$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
even though the driver installation works fine. I did install version 535 from the deb package.
$ sudo dmesg
[ 224.448228] NVRM: None of the NVIDIA devices were initialized.
[ 224.448522] nvidia-nvlink: Unregistered Nvlink Core, major device number 234
[ 224.799882] nvidia-nvlink: Nvlink Core is being initialized, major device number 234
[ 224.800833] nvidia 0000:02:00.0: Unable to change power state from D3cold to D0, device inaccessible
[ 224.803676] NVRM: The NVIDIA GPU 0000:02:00.0
NVRM: (PCI ID: 10de:20f1) installed in this system has
NVRM: fallen off the bus and is not responding to commands.
[ 224.803718] nvidia: probe of 0000:02:00.0 failed with error -1
[ 224.803734] NVRM: The NVIDIA probe routine failed for 1 device(s).
[ 224.803735] NVRM: None of the NVIDIA devices were initialized.
[ 224.803830] nvidia-nvlink: Unregistered Nvlink Core, major device number 234
[ 225.135010] nvidia-nvlink: Nvlink Core is being initialized, major device number 234
Other information about the system:
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04.3 LTS
Release: 22.04
Codename: jammy
$uname -r
6.2.0-34-generic
Any help very much appreciated.
Thanks,
Michael