I was already looking for a solution for this problem in this forum and on other sites, but found none. There are already a lot of posts about this problem, but without an helpful answer. I know, that there could be various causes why I get this error, but I hope that there is a expert who can give a hint. I’m out of any ideas now.
Problem: After some time ( approx. 30 minutes - 2 hours) of mining a GPU got lost. Here a part of the bug-report attached below:
Apr 21 15:59:48 user kernel: NVRM: GPU at PCI:0000:01:00: GPU-1d2e0e8c-69d8-2596-20cd-454daa1bf595 Apr 21 15:59:48 user kernel: NVRM: GPU Board Serial Number: Apr 21 15:59:48 user kernel: NVRM: Xid (PCI:0000:01:00): 79, GPU has fallen off the bus. Apr 21 15:59:48 user kernel: NVRM: GPU at 0000:01:00.0 has fallen off the bus. Apr 21 15:59:48 user kernel: NVRM: GPU is on Board . Apr 21 15:59:48 user kernel: NVRM: A GPU crash dump has been created. If possible, please run NVRM: nvidia-bug-report.sh as root to collect this data before NVRM: the NVIDIA kernel module is unloaded.
Bug report: https://pastebin.com/4TL7BkBL
- 6x Palit GeForce GTX 1070 Ti Dual 8GB (currently 5 of 6 cards connected), connected via risers.
- PSU: 1x Be quiet! Dark power Pro, 1000W, 1x Seasonic Prime Platinum 1200W
- Mainboard: MSI Z270-A PRO
- CPU: Intel Core i5-7500, 3,4GHz
- RAM: 16GB
- SSD: 64GB
- OS: Ubuntu 17.10 (GNU/Linux 4.13.0-38-generic x86_64)
- Driver: 390.48
- CUDA: 9.1.85
- ethminer 0.15.0dev4 with CUDA options via SSH, headless server
- Overclocking: persistence mode: On, powerlimit at 100W, Fanspeed at 70%, nothing else
I already tried:
- Reseated GPU
- Setting Fan Speed in order to get max 56°C
- Remove and fresh install of the driver
- Disabling audio and bus speed to 96 in BIOS options
Can anyone help, please? I tried to find a solution for weeks but without success.