Hi,
we recently got this Supermicro [system] (5014A-TT | SuperWorkstation | A+ Servers | Super Micro Computer, Inc.) without a GPU installed. So I installed two 3080 Tis in there and installed Ubuntu 20.04.03 LTS Server and desktop (kernel 5.4.0-90-generic). I tried all the driver installation methods, that all resulted in a black screen and unusable system. I periodically get this message:
Message from syslogd@jtc-threadripper01 at Nov 18 09:43:58 ...
kernel:[ 998.281214] watchdog: BUG: soft lockup - CPU#45 stuck for 22s! [irq/240-nvidia:2013]
I disabled Secure Foot in BIOS of course.
I am not able to execute nvidia-smi or use the nvidia-bug-report.sh.
First of all, please blacklist nouveau.
Furthermore, what are you trying to use the nvidia gpus for, run them in Xorg in Mosaic mode? You might have to disable iommu for that, depending on bios.
Nouveau is now blacklisted. I get the same errors again when calling nvidia-smi.
We want to use the nvidia gpus to run computational chemistry programs on it using cuda. For our purpose we don’t need high double precision performance therefore opting for the “cheaper” gaming style cards.
Since you’re running headless, nvidia-persistenced has to be enabled to start on boot and needs to be running. This shouldn’t affect the gpus in this way, though. Please check anyway, run
sudo systemctl start nvidia-persistenced
then see if nvidia-smi works.
Otherwise, you can only remove the gpus and test them one by one for defective hardware, I guess.