I’m having problems with Asus GX10 since the day I bought it, 3 months ago.
I’ve applied all frequent updates, lowered the clock to 2138 MHz, confirmed the BIOS firmware to be correct on both PD FW, set the GPU to persistent mode to try to keep it alive, tried full power cycle,… but without any success.
The GPU keeps going off on different time frames, without any load. Sometimes crashes one hour after rebooting, other times crashes after 3 or 4 days… always without load (low power consumption and low temperatures).
I’m running it as a server for vllm (with gemma 4 26b), openweb ui and docling, all using docker and with cuda support.
This equipment is very unstable and has been a disappointment.
Can anyone help with this issue? I’ve attached log files.
If you have AI setting things up for you, it loves to pin stuff to lan IP’s when it should be local loopback and just weird slop like that. First thing I would check is whether it’s not pinning Lan ip’s and causing loopbacks on your network. Most modern routers will disable the port for 5 minutes or so after that’s detected. Also check your drivers, etc…is where I would start.
Reviewed both logs. The journal confirms a Class 4 failure — DOE mailbox stuck on 000f:01:00.0 with PCIe link collapsed to x0 from the first second of boot. The GPU was non-functional before any workload started. One hour later the NVIDIA driver’s work queue thread (nv_queue) locked up trying to access registers over the dead link — that’s what you’re seeing as the “crash without load.”
To continue the diagnosis, the current logs are incomplete. The sudo_dmesg.txt was filtered with grep and captured nothing because the GPU never initialized on this boot.
I had:
docker info | grep -i cgroup
Cgroup Driver: systemd
Cgroup Version: 2
cgroupns
Now it is stable for 24h with:
docker info | grep -i cgroup
Cgroup Driver: cgroupfs
Cgroup Version: 2
cgroupns
I’ll keep it running continuously and test it under load with a few OpenWebUI users over the week. If everything remains stable, I’ll confirm this as the solution.
Thank you for your feedback.
The LAN ip’s are all ok without loopbacks, the GX10 never had problems with connection.
The only issue is the gpu that crashes frequently.
I hope that it is solved with cgroupfs.