we recently got this Supermicro [system] (5014A-TT | SuperWorkstation | A+ Servers | Super Micro Computer, Inc.) without a GPU installed. So I installed two 3080 Tis in there and installed Ubuntu 20.04.03 LTS Server and desktop (kernel 5.4.0-90-generic). I tried all the driver installation methods, that all resulted in a black screen and unusable system. I periodically get this message:
Message from syslogd@jtc-threadripper01 at Nov 18 09:43:58 ...
kernel:[ 998.281214] watchdog: BUG: soft lockup - CPU#45 stuck for 22s! [irq/240-nvidia:2013]
I disabled Secure Foot in BIOS of course.
I am not able to execute nvidia-smi or use the nvidia-bug-report.sh.
Would be happy to get some help.
Please provide a dmesg output from right after boot.
Here is the output directly after boot, after I ran the Nvidia*.run script.
dmesg.txt (109.6 KB)
First of all, please blacklist nouveau.
Furthermore, what are you trying to use the nvidia gpus for, run them in Xorg in Mosaic mode? You might have to disable iommu for that, depending on bios.
Nouveau is now blacklisted. I get the same errors again when calling
We want to use the nvidia gpus to run computational chemistry programs on it using cuda. For our purpose we don’t need high double precision performance therefore opting for the “cheaper” gaming style cards.
Odd, there wasn’t any error visible in the dmesg, does
yes worked here is the bug-report, thanks for your help so far.
nvidia-bug-report.log.gz (71.7 KB)
Since you’re running headless, nvidia-persistenced has to be enabled to start on boot and needs to be running. This shouldn’t affect the gpus in this way, though. Please check anyway, run
sudo systemctl start nvidia-persistenced
then see if nvidia-smi works.
Otherwise, you can only remove the gpus and test them one by one for defective hardware, I guess.
Well after installing the drivers blacklisting Nouveau previous to that and after reboot, I entered the command and got
Failed to start nvidia-persistenced.service: Unit nvidia-persistenced.service not found.
Looks like a faulty PCIe slot, both GPUs work in PCIe slot 1. Thanks for your time.