2080 Ti "fallen off the bus" on ubuntu 18.04

We have two 2080 Ti on the same motherboard (without NVLink) for deep learning applications. Issue happens when both GPUs are training. One GPU becomes unresponsive after ~10 minute into training. This consistently happens to one of the GPUs, even after we swap the two GPUs into each other’s PCI-e slots.

Screenshot of error, log here: shared - Google Drive

XID 79, “fallen off the bus” is mostly caused by either insufficient power supply or overheating. Please monitor the gpu temperatures, check for sufficient air flow, check PSU.