[SOLVED] 325.15 driver causes X crash and memory leak

another bug report for my 8400m g gpu.

bug is reproduceable with any of 3xx driver series.

I. environments

manjaro linux, 3.9 kernel - fails
arch linux - 3 series kernel - fails
debian wheezy - 3 series kernel - fails
windows 7 (sic!) - fails

driver versions:

tested every driver of 3xx series
2xx - not tested
173.xx - runs well

II. installation method:

  1. building driver from binary against 3.9 kernel
  2. installing driver from arch repo

III. reproduction method

former attempts to run driver always failed after login to DE (kde, gnome) just after login manager.
i’ve figured out that running de is possible when compositing is disabled, however running any app that (probably) might use opengl in any forms results in crash

manajro clean install with xfce and 325.15 let mi login to xfce and run glxgears which run for
a second or two - then resulting in X crash. crash repeats always with same method applied. after few minutes laptop leds indicate kernel panic, and after another while computer shuts down.

IV. debug

this time i’ve tried to debug problem via ssh, running nvidia-bud-report right after crash.
bug report was generated properly.

also worth mentioning is top output after crash:

736 root      19  -1   63448  41800  14056 R 100,0  2,0   7:33.44 X                                                                      
  731 root      20   0   10336   3464   2792 S   0,3  0,2   0:00.12 sshd                                                                   
 1582 root      20   0    5272   1304    968 R   0,3  0,1   0:00.04 top                                                                    
    1 root      20   0    5124   2772   1876 S   0,0  0,1   0:01.67 systemd                                                                
    2 root      20   0       0      0      0 S   0,0  0,0   0:00.00 kthreadd                                                               
    3 root      20   0       0      0      0 S   0,0  0,0   0:00.04 ksoftirqd/0                                                            
    5 root       0 -20       0      0      0 S   0,0  0,0   0:00.00 kworker/0:0H                                                           
    7 root       0 -20       0      0      0 S   0,0  0,0   0:00.00 kworker/u:0H                                                           
    8 root      rt   0       0      0      0 S   0,0  0,0   0:00.00 migration/0                                                            
    9 root      20   0       0      0      0 S   0,0  0,0   0:00.06 rcu_preempt                                                            
   10 root      20   0       0      0      0 S   0,0  0,0   0:00.00 rcu_bh                                                                 
   11 root      20   0       0      0      0 S   0,0  0,0   0:00.00 rcu_sched

100% cpu usage for X process

as i’ve spent 20+ hours testing plenty of configurations and trying to debug your driver i kindly ask developers to spend 30 second just to respond, that they recieved this bug report and are aware of the problem.
nvidia-bug-report.log.gz (37.4 KB)

Hi robotobibok,

I’m not seeing any crash messages in your bug report log file. Can you please describe in more detail exactly what is crashing? It sounds like it’s not the X server since you reported that it’s using 100% of a CPU core. Does your system have a serial port, so that you could set up a serial console to capture the kernel messages when the crash occurs?

first of all, thank you for response.

i’m sorry i have no serial port on this machine. i’ve managed to capture one kernel error after crash via ssh which could not be included in submitted bug report, because catching it required proper timinig in executing dmesg :) basicly i’ve measured time before mentioned kernel panic leds go off and run dmesg just second before full system crash (kernel panic). untill that system seems to be running but is accesible only from ssh yet X process uses 96-100% of cpu. however affected laptop gives no response from any input device (keyboard, mouse) and screen hangs - on de login screen while compositing enabled or after running glxgears while compositing disabled.

i hope i’ve been descriptive enough this time.

and here is kernel error:

[  277.640300] NVRM: GPU at 0000:01:00: GPU-a896310c-9f4d-6caf-ecce-a13696d2fc11
[  277.640308] NVRM: Xid (0000:01:00): 26, Ch 00000001 M 00000f10 D 00000000 intr 04400000
[  280.363875] NVRM: GPU at 0000:01:00.0 has fallen off the bus.
/usr/bin/nvidia-smi -pm 1

does not help either. also i can’t tell but for some reason

[ 280.363875] NVRM: GPU at 0000:01:00.0 has fallen off the bus.

error is not saved in /var/log/kernel.log but i guess that is distro specific issue.

well it has been almost three years since this issue has been not resolved BUT after week of testing and hardly believing i CONFIRM that 331.38 driver fixes this issue!

i’d like to sincerely thank all developers for their work and mr. plattner alone for investigating this bug.