Debian sid Nvidia-304.135-137 6200 LE Xserver restarts and NVRM crashes

Hello ,

i have severe problem using normally Nvidia binary drivers .

I am using vanilla kernel 3.14.1 and Debian sid with xserver-xorg 1:7.7+19 and lxdm . I was waiting patient from Nvidia driver 304.135 and hoping that problem will be solved , but even with new release of driver i am still experiencing the same .
Driver is building and installing normally . After loading nvidia module :

NVRM: loading NVIDIA UNIX x86 Kernel Module 304.137 Thu Sep 14 12:49:20 PDT 2017
[11418.567676] [drm] Module unloaded
[11462.925877] nvidia 0000:03:00.0: vgaarb: changed VGA decodes: olddecodes=none,decodes=none:owns=io+mem
[11462.926307] [drm] Initialized nvidia-drm 0.0.0 20150116 for 0000:03:00.0 on minor 0
[11462.926317] NVRM: loading NVIDIA UNIX x86 Kernel Module 304.137 Thu Sep 14 12:49:20 PDT 2017

i start to receive often messages :

[11520.239080] NVRM: VM: nv_kern_close:1757: 0xe5884a40, 1 page(s), count = 1, flags = 0x00000011, key = 0x2b6b8000, page_table = 0xeb456104
[11520.239090] NVRM: VM: nv_kern_close:1757: 0xe5884a00, 16 page(s), count = 1, flags = 0x00000011, key = 0x25a22000, page_table = 0xe594e884
[11520.239105] NVRM: VM: nv_kern_close:1757: 0xe5884000, 28 page(s), count = 1, flags = 0x00000011, key = 0x2aee8000, page_table = 0xe5a8ae84
[11520.239124] NVRM: VM: nv_kern_close:1757: 0xe586f980, 32 page(s), count = 1, flags = 0x00000011, key = 0x2ae0d000, page_table = 0xeacdc704

which leads start and shutdown loops of display manager ( LXDM ) :

acpid: client connected from 7110[0:0]
Sep 26 15:45:25 debian-geri acpid: 1 client rule loaded
Sep 26 15:45:25 debian-geri kernel: [11550.243347] NVRM: VM: nv_kern_close:1757: 0xebb7a200, 1 page(s), count = 1, flags = 0x00000011, key = 0x2b03c000, page_table = 0xeaec5c24
Sep 26 15:45:25 debian-geri kernel: [11550.243357] NVRM: VM: nv_kern_close:1757: 0xebb7a2c0, 16 page(s), count = 1, flags = 0x00000011, key = 0x2ae92000, page_table = 0xeb736d04
Sep 26 15:45:25 debian-geri kernel: [11550.243368] NVRM: VM: nv_kern_close:1757: 0xeb143400, 28 page(s), count = 1, flags = 0x00000011, key = 0x25ac0000, page_table = 0xeb7f4984
Sep 26 15:45:25 debian-geri kernel: [11550.243382] NVRM: VM: nv_kern_close:1757: 0xe5884700, 32 page(s), count = 1, flags = 0x00000011, key = 0x2adf4000, page_table = 0xe584cf04
Sep 26 15:45:25 debian-geri kernel: [11550.243401] NVRM: VM: nv_kern_close:1757: 0xe58729c0, 16 page(s), count = 1, flags = 0x00020011, key = 0x25a65000, page_table = 0xebb42804
Sep 26 15:45:25 debian-geri kernel: [11550.244117] NVRM: VM: nv_kern_close:1757: 0xe5872880, 2 page(s), count = 1, flags = 0x00000011, key = 0x2ad6f000, page_table = 0xe78d5dc4
Sep 26 15:45:27 debian-geri acpid: client 7110[0:0] has disconnected
Sep 26 15:45:30 debian-geri systemd[1]: lxdm.service: Main process exited, code=exited, status=1/FAILURE
Sep 26 15:45:30 debian-geri systemd[1]: lxdm.service: Unit entered failed state.
Sep 26 15:45:30 debian-geri systemd[1]: lxdm.service: Failed with result ‘exit-code’.
Sep 26 15:45:30 debian-geri udev-acl.ck[7118]: g_slice_set_config: assertion ‘sys_page_size == 0’ failed
Sep 26 15:45:30 debian-geri systemd[1]: lxdm.service: Service hold-off time over, scheduling restart.
Sep 26 15:45:30 debian-geri systemd[1]: Stopped LXDE Display Manager.
Sep 26 15:45:30 debian-geri systemd[1]: Starting LXDE Display Manager…
Sep 26 15:45:30 debian-geri systemd[1]: Started LXDE Display Manager.
Sep 26 15:45:30 debian-geri [7124]: g_slice_set_config: assertion ‘sys_page_size == 0’ failed
Sep 26 15:45:31 debian-geri acpid: client connected from 7125[0:0]

By that reason i am stuck with NOUVEAU driver in xserver which is terrible . I am unable to downgrade cause old drivers which works flawlesly doesn`t support new ABI of xserver-xorg .

I really hope there will be any help !

nvidia-bug-report.log.gz (59.8 KB)

Please run nvidia-bug-report.sh and attach output file to your post.

File attached

Looks like an old ASUS board, do you happen to have the PEG link mode in bios? If so, set it to slow and see if that helps.

There was such an option " PEG Link mode " which was set to auto . I have set it to slow , but the result is same . Start/Stop loop of display manager as you can see from post or bug-report .

It is very strange that it was working well till version 304.135 and xserver-xorg lower version . I have already tried with another nvidia card , but result is the same … If you have any other advice it will be very nice to tryout .
nvidia-bug-report.log.gz (58.9 KB)

Which was the last known working combination of driver, kernel and maybe gcc?

As far as i remember it was :

kernel 3.7.4 vanilla branch
and
kernel 3.16.0-4-686-pae with Debian patches

Driver version was :

NVIDIA-Linux-x86-304.64

which is unusable because of Xserver ABI … although it can be patched to built against newer kernels .

about gcc it had to be between 4.6 and 4.9 .

I have tried with these kernels and NVIDIA-Linux-x86-304.135 , if you advice i can try and with 304.137 .

That’s a rather big jump
kernel 3.16->4.13
gcc 4.x->7.2
driver 304.64->304.135
xserver 1.14->1.19
Important is to find out when it broke. So I don’t think there’s sense in trying 3.16 kernel with the .137 driver.
What other nvidia card did you try?
I’ll think of a non-breaking procedure to get to the point of breaking.

Some notes for later reference about driver and supported xserver
304.117 1.15
304.123 1.16
304.125 1.17
304.131 1.18
304.134 1.19
6series fixes 304.119