X server crashes - GeForce GTX 660 - Driver 418.56 - archlinux

jf.peyridieu · February 5, 2019, 10:37pm

Hi,

(sorry for my broken english)
I recently (2 month ago) installed archlinux on my computer (it was
previously on Debian unstable). Since, I had 4 crashes those seems
related to nvidia driver.
They didn’t happened in the same circumstances, but the symptoms were
the same (strange blocks of pixels on the screen, X hangs, and for 3 of the 4 crashes system
hangs).

Can you please help me to solve this problem, or at least to find a
workaround.

Regards,
JF
nvidia-bug-report.log.gz (1.02 MB)
nvidia-bug-report.log.old.gz (1.02 MB)

generix · February 6, 2019, 2:34pm

Which DE are you using?
Do you get a stable system if you revert to the 390 or 340 legacy drivers?

jf.peyridieu · February 6, 2019, 2:54pm

I’m using cinnamon.

I didn’t tried to revert to an old version, because the crash is not easy to reproduce. There were 3 weeks between the 2 lasts crashs.

generix · February 6, 2019, 3:13pm

It’s hard to tell something definitive, the logs are showing two different crashes, similar but different.

jf.peyridieu · February 6, 2019, 5:49pm

There are some more journactl logs that seems not to be in the report.
journactl_20190205.gz (40.8 KB)

jf.peyridieu · February 8, 2019, 5:48pm

1 more crash, just right now.
Xorg.0.log.old.gz (5.58 KB)
nvidia-bug-report.log.gz (1.02 MB)

jf.peyridieu · February 22, 2019, 3:16pm

Just right now, another crash.
Hope this help…
crash_0222_Xorg.log (33.1 KB)
nvidia-bug-report.log.gz (1.03 MB)

generix · February 23, 2019, 2:59pm

The errors you ran into according to the logs:
XID 8
XID 31+8
XID 13+8
XID 31+56
Really hard to say, always a bit different. Maybe check for faulty video memory using cuda-memtest and gpu-burn.

jf.peyridieu · February 23, 2019, 7:25pm

Unfortunately cuda-memtest seems not working :

$ ocl_memtest
hostname is aragorn
CL_PLATFORM_NAME: NVIDIA CUDA
CL_PLATFORM_VERSION: OpenCL 1.2 CUDA 10.1.113
Device 0 is CL_DEVICE_TYPE_GPU, “GeForce GTX 660”
allocated 1725 Mbytes from device 0
[02/23/2019 20:16:11][aragorn][0]:Test0 [Walking 1 bit]
[02/23/2019 20:16:11][aragorn][0]:Test0: global walk test
ERROR: opencl call failed with rc(-5), line 39, file ocl_tests.cpp
Error: Out of resources

Do I need something more to be able to run the test ?

$ pacman -Qs nvidia
local/cuda_memtest 1.2.3-3
A GPU memory test utility for NVIDIA and AMD GPUs. OpenCL version.
local/lib32-libvdpau 1.1.1-3
Nvidia VDPAU library
local/lib32-nvidia-utils 418.43-1
NVIDIA drivers utilities (32-bit)
local/libvdpau 1.1.1+3+ga21bf7a-1
Nvidia VDPAU library
local/libxnvctrl 418.43-1
NVIDIA NV-CONTROL X extension
local/nvidia-dkms 418.43-2
NVIDIA driver sources for linux
local/nvidia-settings 418.43-1
Tool for configuring the NVIDIA graphics driver
local/nvidia-utils 418.43-1
NVIDIA drivers utilities
local/opencl-nvidia 418.43-1
OpenCL implemention for NVIDIA

Regards,
JF

jf.peyridieu · February 23, 2019, 7:46pm

Juste strated some test with gpu-burn :

$ ./gpu_burn -d 300
GPU 0: GeForce GTX 660 (UUID: GPU-362d83a9-dfec-ae62-fe7d-da8df851203f)
Initialized device 0 with 1994 MB of memory (1703 MB available, using 1533 MB of it), using DOUBLES
11.0% proc’d: 135 (71 Gflop/s) errors: 0 temps: 46 C
Summary at: sam. févr. 23 20:40:58 CET 2019

21.7% proc’d: 225 (71 Gflop/s) errors: 0 temps: 50 C
Summary at: sam. févr. 23 20:41:30 CET 2019

32.7% proc’d: 405 (71 Gflop/s) errors: 0 temps: 53 C
Summary at: sam. févr. 23 20:42:03 CET 2019

43.3% proc’d: 495 (71 Gflop/s) errors: 0 temps: 56 C
Summary at: sam. févr. 23 20:42:35 CET 2019

53.3% proc’d: 630 (71 Gflop/s) errors: 0 temps: 57 C
Summary at: sam. févr. 23 20:43:05 CET 2019

65.0% proc’d: 765 (71 Gflop/s) errors: 0 temps: 58 C
Summary at: sam. févr. 23 20:43:40 CET 2019

76.0% proc’d: 945 (71 Gflop/s) errors: 0 temps: 59 C
Summary at: sam. févr. 23 20:44:13 CET 2019

86.7% proc’d: 1035 (71 Gflop/s) errors: 0 temps: 59 C
Summary at: sam. févr. 23 20:44:45 CET 2019

98.0% proc’d: 1215 (71 Gflop/s) errors: 0 temps: 60 C
Summary at: sam. févr. 23 20:45:19 CET 2019

100.0% proc’d: 1260 (71 Gflop/s) errors: 0 temps: 60 C
Killing processes… done

Tested 1 GPUs:
GPU 0: OK

generix · February 24, 2019, 2:03pm

Looks like arch only provides the OCL version of cuda-memtest and it’s broken:
[url]https://aur.archlinux.org/packages/cuda_memtest/[/url]
So you would have to manually install cuda and cuda-memtest and use cuda_memtest instead of ocl_memtest.
The results of gpu-burn look good though so I don’t know if cuda_memtest would bring up any new info.

jf.peyridieu · March 6, 2019, 8:11pm

One more crash.
This time, it takes a few seconds before the system freeze. I saw the CPU usage curve growing to 100%. i’m not sure this crash is related to the driver…
There are the files collected :
nvidia-bug-report.log.gz (1.09 MB)
crash_0306_Xorg.log (31.7 KB)

generix · March 6, 2019, 10:14pm

The gpu/driver wasn’t involved in this crash.
Taken into account that in previous crashes the gpu was involved but always differently and the gpu-burn test ran fine, I’ll suspect a hw issue but not the gpu. Maybe some subtle system memory fault or a breaking psu or even harddrive. IDK, very hard to say. It’ll probably get worse until the faulty part breaks completely, so you’ll know by then.

jf.peyridieu · March 7, 2019, 6:44pm

Once again… and this time it was related to the GPU. The screen freeze with stranges pixels.

crash_0307_Xorg.log (30.7 KB)
nvidia-bug-report.log.gz (1.09 MB)

amrits · March 8, 2019, 10:13am

Hello,

I have gone though bug report attached in comment #14 and observed you are getting Xid error code 62.
I would like to reproduce issue internally and hence need detailed steps to reproduce issue.
Moreover, please provide dmidecode output as well.

jf.peyridieu · March 8, 2019, 5:45pm

Hello,

I have not identified a special way to reproduce this issue.
Anyway, there is the dmidecode output :
dmidecode_output.txt (22.7 KB)

jf.peyridieu · March 21, 2019, 6:56pm

Hi again,

15 days since my last crash…

There are the logs and a picture of the screen :
nvidia-bug-report.log.gz (1.1 MB)

jf.peyridieu · April 1, 2019, 8:23pm

Hi,

Once more. Exactly the same as the last crash.

nvidia-bug-report.log.gz (1.1 MB)

jf.peyridieu · April 17, 2019, 6:34pm

Another crash today.
(I’ve updated the title of the post with my current version of nvidia driver).
nvidia-bug-report.log.gz (1.1 MB)

jf.peyridieu · May 14, 2019, 9:32pm

Hello,

Almost one month since the last crash, but the problem is still there.
nvidia-bug-report.log.gz (1.1 MB)

Topic		Replies	Views
Debian 8.1 Xorg crash with Quadro FX 1800, usually when opening new windows in GNOME 3 Linux	5	2462	August 8, 2015
X server crashes with 319.23 and Debian 7 Linux	13	5187	December 12, 2013
X hangs using 100% CPU, WAIT and mieq overflowing errors in logs Linux	67	23577	June 28, 2014
Arbitrary Crashes / Segfaults with RTX 3070 on current driver-455 on Ubuntu 20.04 kernel 5.4.0-58-generic Linux	23	2178	February 25, 2021
Severe stability issue with nvidia 367.18 driver Linux	29	5114	June 13, 2016
Xid 61 (black screen on startup) Ubuntu 18.04 GTX 1060 mobile Linux	12	3374	August 11, 2020
Nvidia driver for 2080 ti causes one AMD CPU to lock up (Ubuntu) Linux ubuntu	12	5174	April 20, 2021
X/NVIDIA freeze (Arch Linux) with 415.25 on a Quadro M1000M Linux	8	1499	January 14, 2019
Nvidia and Arch Linux Problems, Black Screen of Death. Linux	16	7148	October 16, 2019
Frequent Freeze/Crash of Xorg with drivers 310.19 with GTS 250 on 3.2.0-4-amd64 Linux	20	15937	June 25, 2013

X server crashes - GeForce GTX 660 - Driver 418.56 - archlinux

Related topics