Hi, I use Ubuntu 16.04.4 LTS, with Intel(R) Xeon(R) CPU, 256G Mem, 4 TITAN X GPU.
However, when I run TensorFlow program on the machine, the GPU always get lost.
[27897.533246] IPv6: ADDRCONF(NETDEV_CHANGE): vethGYAM8U: link becomes ready
[27897.533339] lxcbr0: port 7(vethGYAM8U) entered forwarding state
[27897.533366] lxcbr0: port 7(vethGYAM8U) entered forwarding state
[27912.560567] lxcbr0: port 7(vethGYAM8U) entered forwarding state
[65787.534692] INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 2.915 msecs
[69700.018347] INFO: NMI handler (ghes_notify_nmi) took too long to run: 2.470 msecs
[72059.315140] NVRM: GPU at PCI:0000:82:00: GPU-39272298-b5c7-7577-fd95-cc831ef4bbcb
[72059.315150] NVRM: GPU Board Serial Number: 0324917145140
[72059.315156] NVRM: Xid (PCI:0000:82:00): 79, GPU has fallen off the bus.
[72059.315212] NVRM: GPU at 0000:82:00.0 has fallen off the bus.
[72059.315215] NVRM: GPU is on Board 0324917145140.
[72059.315226] NVRM: A GPU crash dump has been created. If possible, please run
NVRM: nvidia-bug-report.sh as root to collect this data before
NVRM: the NVIDIA kernel module is unloaded.
[98482.675077] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 2.547 msecs
nvidia-bug-report.log.gz (209 KB)
nvidia-bug-report.log (596 KB)