I am experiencing spurious system hangs with my Ubuntu 24.04 desktop. The machine has a single RTX A5500 installed and no other peripherals or cards. The system was built by Orbital Computers. That is I didn’t throw the parts together myself so I’m optimistic the system was assembled carefully and tested before shipping.
I cannot find any pattern in the system logs that suggests a root cause for this behavior. The hang is most frequent when interacting with the GPU, particularly through the NVIDIA container toolkit. Last week I carefully upgraded to the 550 drivers, with no change in behavior. When I upgraded the container toolkit however, the issue seemed to go away. Today, the issue is back which seems to invalidate my suspicion regarding the container toolkit being the culprit. I have run a RAM integrity check from the BIOS with no issues reported.
nvidia-bug-report.sh logs are attached here.
nvidia-bug-report.log.gz (517.1 KB)
Any help in troubleshooting would be most appreciated.