980Ti crash

Hi, guys:

recently, I setup a machine: ASUS X99-E WS board + three 980 ti, when install CUDA, run my program, the machine reboot itself:
before reboot, some error about NVIDIA in /var/log/syslog

May 12 12:23:39 galois kernel: [ 40.145039] NVRM: The NVIDIA probe routine failed for 3 device(s).
May 12 12:23:39 galois kernel: [ 40.145040] NVRM: None of the NVIDIA graphics adapters were initialized!
May 12 13:18:02 galois kernel: [ 8.433346] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 352.93 Tue Apr 5 18:18:24 PDT 2016
May 12 13:43:03 galois kernel: [ 8.253377] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 352.93 Tue Apr 5 18:18:24 PDT 2016
May 12 13:51:33 galois kernel: [ 7.957049] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 352.93 Tue Apr 5 18:18:24 PDT 2016
May 12 14:33:14 galois kernel: [ 2513.525187] NVRM: GPU at 0000:09:00.0 has fallen off the bus.
May 12 14:33:14 galois kernel: [ 2513.525197] NVRM: A GPU crash dump has been created. If possible, please run
May 12 14:33:14 galois kernel: [ 2513.525197] NVRM: nvidia-bug-report.sh as root to collect this data before
May 12 14:33:14 galois kernel: [ 2513.525197] NVRM: the NVIDIA kernel module is unloaded.
May 12 14:33:14 galois kernel: [ 2513.525235] NVRM: GPU at 0000:0b:00.0 has fallen off the bus.
May 12 14:33:15 galois kernel: [ 2513.978056] NVRM: GPU at 0000:05:00.0 has fallen off the bus.
May 12 14:33:15 galois kernel: [ 2513.978063] NVRM: A GPU crash dump has been created. If possible, please run
May 12 14:33:15 galois kernel: [ 2513.978063] NVRM: nvidia-bug-report.sh as root to collect this data before
May 12 14:33:15 galois kernel: [ 2513.978063] NVRM: the NVIDIA kernel module is unloaded.
May 12 14:40:02 galois kernel: [ 2921.405260] NVRM: request_irq() failed (-22)
May 12 14:40:02 galois kernel: [ 2921.405268] NVRM: nvidia_frontend_open: minor 2, module->open() failed, error -22
May 12 14:40:02 galois kernel: [ 2921.405335] NVRM: request_irq() failed (-22)
May 12 14:40:02 galois kernel: [ 2921.405339] NVRM: nvidia_frontend_open: minor 1, module->open() failed, error -22
May 12 14:40:02 galois kernel: [ 2921.405399] NVRM: request_irq() failed (-22)
May 12 14:40:02 galois kernel: [ 2921.405402] NVRM: nvidia_frontend_open: minor 0, module->open() failed, error -22
May 12 14:40:02 galois kernel: [ 2921.405534] NVRM: request_irq() failed (-22)
May 12 14:40:02 galois kernel: [ 2921.405538] NVRM: nvidia_frontend_open: minor 2, module->open() failed, error -22
May 12 14:40:23 galois kernel: [ 2942.208883] NVRM: request_irq() failed (-22)
May 12 14:40:23 galois kernel: [ 2942.208891] NVRM: nvidia_frontend_open: minor 2, module->open() failed, error -22
May 12 14:40:23 galois kernel: [ 2942.208963] NVRM: request_irq() failed (-22)
May 12 14:40:23 galois kernel: [ 2942.208967] NVRM: nvidia_frontend_open: minor 1, module->open() failed, error -22
May 12 14:40:23 galois kernel: [ 2942.209030] NVRM: request_irq() failed (-22)
May 12 14:40:23 galois kernel: [ 2942.209033] NVRM: nvidia_frontend_open: minor 0, module->open() failed, error -22
May 12 14:40:23 galois kernel: [ 2942.209174] NVRM: request_irq() failed (-22)
May 12 14:40:23 galois kernel: [ 2942.209178] NVRM: nvidia_frontend_open: minor 2, module->open() failed, error -22
May 12 14:40:49 galois kernel: [ 2969.053328] NVRM: request_irq() failed (-22)
May 12 14:40:49 galois kernel: [ 2969.053334] NVRM: nvidia_frontend_open: minor 2, module->open() failed, error -22
May 12 14:40:49 galois kernel: [ 2969.053395] NVRM: request_irq() failed (-22)
May 12 14:40:49 galois kernel: [ 2969.053399] NVRM: nvidia_frontend_open: minor 1, module->open() failed, error -22
May 12 14:40:49 galois kernel: [ 2969.053448] NVRM: request_irq() failed (-22)
May 12 14:40:49 galois kernel: [ 2969.053450] NVRM: nvidia_frontend_open: minor 0, module->open() failed, error -22
May 12 14:40:49 galois kernel: [ 2969.053557] NVRM: request_irq() failed (-22)
May 12 14:40:49 galois kernel: [ 2969.053560] NVRM: nvidia_frontend_open: minor 2, module->open() failed, error -22
May 12 14:40:49 galois kernel: [ 2969.055124] NVRM: request_irq() failed (-22)
May 12 14:40:49 galois kernel: [ 2969.055129] NVRM: nvidia_frontend_open: minor 2, module->open() failed, error -22
May 12 14:40:49 galois kernel: [ 2969.055180] NVRM: request_irq() failed (-22)
May 12 14:40:49 galois kernel: [ 2969.055183] NVRM: nvidia_frontend_open: minor 1, module->open() failed, error -22
May 12 14:40:49 galois kernel: [ 2969.055233] NVRM: request_irq() failed (-22)
May 12 14:40:49 galois kernel: [ 2969.055235] NVRM: nvidia_frontend_open: minor 0, module->open() failed, error -22
May 12 14:40:49 galois kernel: [ 2969.055332] NVRM: request_irq() failed (-22)
May 12 14:40:49 galois kernel: [ 2969.055334] NVRM: nvidia_frontend_open: minor 2, module->open() failed, error -22
May 12 14:40:49 galois kernel: [ 2969.072093] NVRM: request_irq() failed (-22)
May 12 14:40:49 galois kernel: [ 2969.072099] NVRM: nvidia_frontend_open: minor 2, module->open() failed, error -22
May 12 14:40:49 galois kernel: [ 2969.072197] NVRM: request_irq() failed (-22)
May 12 14:40:49 galois kernel: [ 2969.072200] NVRM: nvidia_frontend_open: minor 1, module->open() failed, error -22
May 12 14:40:49 galois kernel: [ 2969.072249] NVRM: request_irq() failed (-22)
May 12 14:40:49 galois kernel: [ 2969.072252] NVRM: nvidia_frontend_open: minor 0, module->open() failed, error -22
May 12 14:40:49 galois kernel: [ 2969.072386] NVRM: request_irq() failed (-22)
May 12 14:40:49 galois kernel: [ 2969.072389] NVRM: nvidia_frontend_open: minor 2, module->open() failed, error -22
May 12 14:40:49 galois kernel: [ 2969.072442] NVRM: request_irq() failed (-22)
May 12 14:40:49 galois kernel: [ 2969.072445] NVRM: nvidia_frontend_open: minor 1, module->open() failed, error -22
May 12 14:40:49 galois kernel: [ 2969.072519] NVRM: request_irq() failed (-22)
May 12 14:40:49 galois kernel: [ 2969.072525] NVRM: nvidia_frontend_open: minor 0, module->open() failed, error -22

https://drive.google.com/file/d/0B6gla9gH0qkpUktJakVoNUoyUWs/view?usp=sharing

appreciate any idea!

Best~

maybe you don’t have the necessary auxiliary power plugged into the GPUs

or maybe the system BIOS is not handling the 3 GPUs correctly. Make sure you have the latest BIOS flashed into your motherboard.

or maybe you haven’t installed CUDA correctly, for example failure to properly remove the nouveau driver.

degrade kernel version to 3.13, works