Thanks for the comment.
I ran gpu-burn.
I checked four 2080ti of mine one by one.
In addition, I added nvidia-bug-report.sh.gz to this thread.
(The first and second logs were recorded under the two-GPU setting, and the third one was recorded after
booting with only No.4 GPU.)
I hope this information helps investigate the causes of the GPU problem.
For 2080TI no.1
(After reaching 70C, errors happened.)
GPU 0: GeForce RTX 2080 Ti (UUID: GPU-e63b2eba-1c12-2cc8-8203-cccb8c25ee44)
10.8% proc’d: 9030 (12497 Gflop/s) errors: 0 temps: 56 C
Summary at: Fri Jan 25 18:10:37 JST 2019
21.7% proc’d: 18662 (12408 Gflop/s) errors: 0 temps: 60 C
Summary at: Fri Jan 25 18:10:50 JST 2019
32.5% proc’d: 27692 (12402 Gflop/s) errors: 0 temps: 62 C
Summary at: Fri Jan 25 18:11:03 JST 2019
43.3% proc’d: 37324 (12322 Gflop/s) errors: 0 temps: 65 C
Summary at: Fri Jan 25 18:11:16 JST 2019
53.3% proc’d: 45752 (12282 Gflop/s) errors: 0 temps: 66 C
Summary at: Fri Jan 25 18:11:28 JST 2019
64.2% proc’d: 55384 (12275 Gflop/s) errors: 0 temps: 68 C
Summary at: Fri Jan 25 18:11:41 JST 2019
75.0% proc’d: 64414 (12258 Gflop/s) errors: 0 temps: 69 C
Summary at: Fri Jan 25 18:11:54 JST 2019
85.8% proc’d: 73444 (12187 Gflop/s) errors: 36 (WARNING!) temps: 70 C
Summary at: Fri Jan 25 18:12:07 JST 2019
96.7% proc’d: 83076 (12180 Gflop/s) errors: 22 (WARNING!) temps: 71 C
Summary at: Fri Jan 25 18:12:20 JST 2019
100.0% proc’d: 86086 (12165 Gflop/s) errors: 41 (WARNING!) temps: 71 C
Killing processes… done
Tested 1 GPUs:
GPU 0: FAULTY
For 2080Ti no.2
(After reaching 60C, errors happened.)
GPU 0: GeForce RTX 2080 Ti (UUID: GPU-650a6f70-3c29-0fa8-6837-e5adcad6a0b9)
10.8% proc’d: 9030 (12450 Gflop/s) errors: 0 temps: 45 C
Summary at: Fri Jan 25 18:42:02 JST 2019
21.7% proc’d: 18662 (12360 Gflop/s) errors: 0 temps: 51 C
Summary at: Fri Jan 25 18:42:15 JST 2019
32.5% proc’d: 27692 (12346 Gflop/s) errors: 0 temps: 54 C
Summary at: Fri Jan 25 18:42:28 JST 2019
43.3% proc’d: 37324 (12287 Gflop/s) errors: 0 temps: 57 C
Summary at: Fri Jan 25 18:42:41 JST 2019
53.3% proc’d: 45752 (13642 Gflop/s) errors: 80488701 (WARNING!) temps: 60 C
Summary at: Fri Jan 25 18:42:53 JST 2019
64.2% proc’d: 56588 (13650 Gflop/s) errors: 117371138 (WARNING!) temps: 60 C
Summary at: Fri Jan 25 18:43:06 JST 2019
75.0% proc’d: 66822 (13648 Gflop/s) errors: 2910964 (WARNING!) temps: 60 C
Summary at: Fri Jan 25 18:43:19 JST 2019
85.8% proc’d: 77056 (13641 Gflop/s) errors: 5792098 (WARNING!) temps: 72 C
Summary at: Fri Jan 25 18:43:32 JST 2019
96.7% proc’d: 87290 (13629 Gflop/s) errors: 37369313 (WARNING!) temps: 72 C
Summary at: Fri Jan 25 18:43:45 JST 2019
100.0% proc’d: 91504 (13629 Gflop/s) errors: 17788159 (WARNING!) temps: 72 C
For 2080Ti No.3
(When going up over 66C, error happened and the temperature suddenly got down. (sensor error ?))
GPU 0: GeForce RTX 2080 Ti (UUID: GPU-a44a70ed-c3c6-a2d1-a727-bfe251278934)
10.8% proc’d: 9030 (12423 Gflop/s) errors: 0 temps: 49 C
Summary at: Fri Jan 25 21:02:25 JST 2019
21.7% proc’d: 18060 (12320 Gflop/s) errors: 0 temps: 54 C
Summary at: Fri Jan 25 21:02:38 JST 2019
32.5% proc’d: 27692 (12272 Gflop/s) errors: 0 temps: 57 C
Summary at: Fri Jan 25 21:02:51 JST 2019
43.3% proc’d: 36722 (12208 Gflop/s) errors: 0 temps: 60 C
Summary at: Fri Jan 25 21:03:04 JST 2019
53.3% proc’d: 45150 (12165 Gflop/s) errors: 0 temps: 62 C
Summary at: Fri Jan 25 21:03:16 JST 2019
64.2% proc’d: 54782 (12121 Gflop/s) errors: 0 temps: 65 C
Summary at: Fri Jan 25 21:03:29 JST 2019
75.0% proc’d: 63210 (12131 Gflop/s) errors: 0 temps: 66 C
Summary at: Fri Jan 25 21:03:42 JST 2019
100.0% proc’d: 70434 (12600 Gflop/s) errors: 56450387 (WARNING!) temps: 57 C
Summary at: Fri Jan 25 21:04:21 JST 2019
For 2080Ti No.4
(The GPU was not recognized by CUDA. It seems to be completely broken.)[url][/url]
No devices found.
nvidia-bug-report.log.gz (484 KB)
nvidia-bug-report.log.old.gz (1.56 MB)
nvidia-bug-report.log.no4.gz (458 KB)