384.69 libglx xorg strtol sigsegv generally thread context fs base is changed during GLX init.

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff56c8ab5 in __GI_____strtol_l_internal (nptr=nptr@entry=0x7fffffffde31 “001 GLX”, endptr=endptr@entry=0x7fffffffde28, base=base@entry=10,
group=group@entry=0, loc=0x4c3a6f6564695658) at …/stdlib/strtol_l.c:293
293 while (ISSPACE (*s))
Missing separate debuginfos, use: dnf debuginfo-install audit-libs-2.7.7-1.fc26.x86_64 bzip2-libs-1.0.6-22.fc26.x86_64 dbus-libs-1.11.16-1.fc26.x86_64 freetype-2.7.1-9.fc26.x86_64 libXau-1.0.8-7.fc26.x86_64 libXdmcp-1.1.2-6.fc26.x86_64 libXfont2-2.0.1-3.fc26.x86_64 libcap-2.25-5.fc26.x86_64 libcap-ng-0.7.8-3.fc26.x86_64 libdrm-2.4.82-1.fc26.x86_64 libfontenc-1.1.3-4.fc26.x86_64 libgcc-7.1.1-3.fc26.x86_64 libgcrypt-1.7.8-1.fc26.x86_64 libgpg-error-1.25-2.fc26.x86_64 libpciaccess-0.13.4-4.fc26.x86_64 libpng-1.6.28-2.fc26.x86_64 libselinux-2.6-7.fc26.x86_64 libunwind-1.2-1.fc26.x86_64 libxshmfence-1.2-4.fc26.x86_64 lz4-libs-1.8.0-1.fc26.x86_64 openssl-libs-1.1.0f-7.fc26.x86_64 pcre-8.41-1.fc26.x86_64 pixman-0.34.0-3.fc26.x86_64 systemd-libs-233-6.fc26.x86_64 xz-libs-5.2.3-2.fc26.x86_64 zlib-1.2.11-2.fc26.x86_64
(gdb) bt
#0 0x00007ffff56c8ab5 in __GI_____strtol_l_internal (nptr=nptr@entry=0x7fffffffde31 “001 GLX”, endptr=endptr@entry=0x7fffffffde28, base=base@entry=10,
group=group@entry=0, loc=0x4c3a6f6564695658) at …/stdlib/strtol_l.c:293
#1 0x00007ffff56c8a22 in __strtol (nptr=nptr@entry=0x7fffffffde31 “001 GLX”, endptr=endptr@entry=0x7fffffffde28, base=base@entry=10)
at …/stdlib/strtol.c:106
#2 0x0000000000459f76 in RegisterExtensionNames (extEntry=extEntry@entry=0x8f1bb0) at registry.c:184
#3 0x000000000045a111 in RegisterExtensionNames (extEntry=extEntry@entry=0x8f1bb0) at registry.c:209
#4 0x00000000004491f7 in AddExtension (name=, NumEvents=, NumErrors=, MainProc=0x7ffff2f2a160,
SwappedMainProc=0x7ffff2f2cbe0, CloseDownProc=0x7ffff2f2bd20, MinorOpcodeProc=0x449380 ) at extension.c:142
#5 0x00007ffff2f29f62 in ?? () from /usr/lib64/xorg/modules/extensions/nvidia/libglx.so
#6 0x0000000000449380 in ?? () at extension.c:215
#7 0x000000000000001c in ?? ()
#8 0x0000000000000200 in ?? ()
#9 0x0000000000000000 in ?? ()

problem is loc=0x4c3a6f6564695658 it is invalid address.
nvidia-bug-report.log.gz (62 KB)

(gdb) run
Starting program: /usr/libexec/Xorg :8 -config /usr/local/etc/bumblebee/xorg.conf.nvidia -configdir /usr/local/etc/bumblebee/xorg.conf.d -sharevts -nolisten tcp -noreset -verbose 3 -isolateDevice PCI:01:00:0 -modulepath /usr/lib64/xorg/modules/extensions/nvidia,/usr/lib64/xorg/modules,/usr/lib64/modules/extensions,/usr/lib64/xorg/modules/input

gdb) info thread
Id Target Id Frame

  • 1 Thread 0x8f17e0 (LWP 27302) “Xorg” (Exiting) 0x00007ffff56c8ab5 in __GI_____strtol_l_internal (nptr=nptr@entry=0x7fffffffde31 “001 GLX”,
    endptr=endptr@entry=0x7fffffffde28, base=base@entry=10, group=group@entry=0, loc=0x4c3a6f6564695658) at …/stdlib/strtol_l.c:293
    (gdb) list
    288 }
    289
    290 save = s = nptr;
    291
    292 /* Skip white space. /
    293 while (ISSPACE (s))
    294 ++s;
    295 if (_glibc_unlikely (*s == L(’\0’)))
    296 goto noconv;
    297
    (gdb) info sharedlibrary
    From To Syms Read Shared Object Library
    0x00007ffff7dd6c50 0x00007ffff7df5590 Yes /lib64/ld-linux-x86-64.so.2
    0x00007ffff7b92aa0 0x00007ffff7bbe321 Yes (
    ) /lib64/libdbus-1.so.3
    0x00007ffff7fb2dc0 0x00007ffff7fc5103 Yes (
    ) /lib64/libudev.so.1
    0x00007ffff7963460 0x00007ffff797a3ff Yes () /lib64/libselinux.so.1
    0x00007ffff753d000 0x00007ffff76b3b88 Yes (
    ) /lib64/libcrypto.so.1.1
    0x00007ffff72cede0 0x00007ffff72cfb0e Yes /lib64/libdl.so.2
    0x00007ffff70b5400 0x00007ffff70bc231 Yes () /lib64/libunwind.so.8
    0x00007ffff6ead0d0 0x00007ffff6eb1375 Yes (
    ) /lib64/libpciaccess.so.0
    0x00007ffff6c9db10 0x00007ffff6ca6360 Yes () /lib64/libdrm.so.2
    0x00007ffff69ff220 0x00007ffff6a8183d Yes (
    ) /lib64/libpixman-1.so.0
    0x00007ffff67caae0 0x00007ffff67e862f Yes () /lib64/libXfont2.so.2
    0x00007ffff65c2d40 0x00007ffff65c3a38 Yes (
    ) /lib64/libXau.so.6
    0x00007ffff7f2d970 0x00007ffff7f86a5f Yes () /lib64/libsystemd.so.0
    0x00007ffff63c08a0 0x00007ffff63c0b6c Yes (
    ) /lib64/libxshmfence.so.1
    0x00007ffff61bb2e0 0x00007ffff61bcb28 Yes () /lib64/libXdmcp.so.6
    0x00007ffff5f94da0 0x00007ffff5f9b0a0 Yes (
    ) /lib64/libaudit.so.1
    0x00007ffff5c81f00 0x00007ffff5cfec5f Yes /lib64/libm.so.6
    0x00007ffff5a62860 0x00007ffff5a70aa1 Yes /lib64/libpthread.so.0
    0x00007ffff56abba0 0x00007ffff5807fa3 Yes /lib64/libc.so.6
    0x00007ffff5476530 0x00007ffff548502f Yes /lib64/libresolv.so.2
    0x00007ffff526f480 0x00007ffff5270d87 Yes () /lib64/libcap.so.2
    0x00007ffff5067fa0 0x00007ffff506b446 Yes /lib64/librt.so.1
    0x00007ffff4e51ac0 0x00007ffff4e61de5 Yes (
    ) /lib64/libgcc_s.so.1
    0x00007ffff4bdd540 0x00007ffff4c2f27d Yes () /lib64/libpcre.so.1
    0x00007ffff49c7260 0x00007ffff49d409f Yes (
    ) /lib64/libz.so.1
    0x00007ffff47bf480 0x00007ffff47c1404 Yes () /lib64/libfontenc.so.1
    0x00007ffff4519420 0x00007ffff458ea64 Yes (
    ) /lib64/libfreetype.so.6
    0x00007ffff42eadf0 0x00007ffff43017c2 Yes () /lib64/liblzma.so.5
    0x00007ffff40d4e60 0x00007ffff40e3f40 Yes (
    ) /lib64/liblz4.so.1
    0x00007ffff3dcf500 0x00007ffff3e8f758 Yes () /lib64/libgcrypt.so.20
    0x00007ffff3bb3820 0x00007ffff3bbcc78 Yes (
    ) /lib64/libgpg-error.so.0
    0x00007ffff39ad370 0x00007ffff39af1a9 Yes () /lib64/libcap-ng.so.0
    0x00007ffff379d570 0x00007ffff37a95e2 Yes (
    ) /lib64/libbz2.so.1
    0x00007ffff356dff0 0x00007ffff358f6d8 Yes () /lib64/libpng16.so.16
    0x00007ffff2c432f0 0x00007ffff30eead8 Yes (
    ) /usr/lib64/xorg/modules/extensions/nvidia/libglx.so
    0x00007ffff23fe810 0x00007ffff24005a3 Yes () /lib64/tls/libnvidia-tls.so.384.69
    0x00007ffff08575e0 0x00007ffff19de257 Yes (
    ) /lib64/tls/libnvidia-glcore.so.384.69
    0x00007fffefaaa390 0x00007ffff00b5fc7 Yes (*) /usr/lib64/xorg/modules/drivers/nvidia_drv.so
    0x00007fffef840040 0x00007fffef8586a8 Yes /usr/lib64/xorg/modules/libfb.so
    0x00007fffef6130b0 0x00007fffef634d21 Yes /usr/lib64/xorg/modules/libwfb.so

(gdb) x/16xb 0x4c3a6f6564695658
0x4c3a6f6564695658: Cannot access memory at address 0x4c3a6f6564695658

glibc-2.25-9.fc26.x86_64
xorg-x11-server-Xorg-1.19.3-4.fc26.x86_64

InitExtensions (argc=argc@entry=16, argv=argv@entry=0x7fffffffe158) at …/…/…/mi/miinitext.c:335

339 (ext->initFunc) ();

(gdb) p *ext
$2 = {initFunc = 0x7ffff2f29d70, name = 0x7ffff312d406 “GLX”, disablePtr = 0x0}

(gdb) disass $rip-32,+64
Dump of assembler code from 0x7ffff2f29ee0 to 0x7ffff2f29f20:
0x00007ffff2f29ee0: test %ebx,%ebx
0x00007ffff2f29ee2: je 0x7ffff2f2a02c
0x00007ffff2f29ee8: movb $0x0,0x10(%rsp)
0x00007ffff2f29eed: mov 0x10(%rsp),%esi
0x00007ffff2f29ef1: xor %ecx,%ecx
0x00007ffff2f29ef3: xor %edx,%edx
0x00007ffff2f29ef5: xor %edi,%edi
0x00007ffff2f29ef7: movq $0x0,0x18(%rsp)
=> 0x00007ffff2f29f00: callq 0x7ffff2c60430

after this call __libc_tsd_LOCALE is spoiled

(gdb) bt
#0 0x00007ffff2f29d70 in ?? () from /usr/lib64/xorg/modules/extensions/nvidia/libglx.so
#1 0x00000000004a8a0d in InitExtensions (argc=argc@entry=16, argv=argv@entry=0x7fffffffe158) at …/…/…/mi/miinitext.c:339
#2 0x000000000043965f in dix_main (argc=16, argv=0x7fffffffe158, envp=) at main.c:201
#3 0x00007ffff56ac50a in __libc_start_main (main=0x423540 , argc=16, argv=0x7fffffffe158, init=, fini=,
rtld_fini=, stack_end=0x7fffffffe148) at …/csu/libc-start.c:295
#4 0x000000000042357a in _start ()

libglx.so is looked as culprit.

(gdb) disass __strtol,+64
Dump of assembler code from 0x7ffff56c8a10 to 0x7ffff56c8a50:
0x00007ffff56c8a10 <__strtol+0>: mov 0x38e399(%rip),%rax # 0x7ffff5a56db0
0x00007ffff56c8a17 <__strtol+7>: xor %ecx,%ecx
=> 0x00007ffff56c8a19 <__strtol+9>: mov %fs:(%rax),%r8
0x00007ffff56c8a1d <__strtol+13>: jmpq 0x7ffff56c8a60 <__GI_____strtol_l_internal>
0x00007ffff56c8a22: nopw %cs:0x0(%rax,%rax,1)
0x00007ffff56c8a2c: nopl 0x0(%rax)
0x00007ffff56c8a30 <__GI___strtoul_internal+0>: mov 0x38e379(%rip),%rax # 0x7ffff5a56db0
0x00007ffff56c8a37 <__GI___strtoul_internal+7>: mov %fs:(%rax),%r8
0x00007ffff56c8a3b <__GI___strtoul_internal+11>: jmpq 0x7ffff56c9220 <__GI_____strtoul_l_internal>
0x00007ffff56c8a40 <__strtoul+0>: mov 0x38e369(%rip),%rax # 0x7ffff5a56db0
0x00007ffff56c8a47 <__strtoul+7>: xor %ecx,%ecx
0x00007ffff56c8a49 <__strtoul+9>: mov %fs:(%rax),%r8
0x00007ffff56c8a4d <__strtoul+13>: jmpq 0x7ffff56c9220 <__GI_____strtoul_l_internal>
End of assembler dump.
(gdb) p/x $rax
$5 = 0xfffffffffffffc70

mov %fs:(%rax),%r8 r8 is __libc_tsd_LOCALE
after some call in libgtx it starts return wrong value.
right value 0x00007ffff5a583e0
wrong value 0x4c3a6f6564695658

(gdb) p &__libc_tsd_LOCALE
$2 = (__locale_t *) 0x7ffff7f14f30
and
(gdb) x/1xg 0x7ffff7f14f30
0x7ffff7f14f30: 0x00007ffff5a583e0
it is not spoiled.
I don’t know how gdb actually calculate this.
To do it one is required to know base of selector.

I don’t know how in gdb check gdtr base for fs=0 but it is looked like
gdtr base for fs has changed.
by the way
cs 0x33 ss 0x2b ds es fs gs are zero.
if base changed then it is changed for gs too.
ds es ignore base in gdtr ()

I’m a little bit stumbled and even can not understand whether problem
is user space or kernel space.
user space
someone writes to %fs:(0xfffffffffffffc70)
or kernel fs does use different base.

fs register is used for tls (thread local storage data).
_libc_tsd_LOCALE is TLS symbol of libc.so.6 which is in .tbss segment.

by the way hw watch point
watch * (unsigned long *) 0x7ffff7f14f30
is not triggered.
That way I wanted to catch code which write to this address.

it seems it is fs base
good case
run under gdb break on 1st AddExtension
generate-cor-dump

eu-readelf -n

run without debugger
and got core dump one again eu-readelf

good case

fs.base: 0x00007ffff7f142c0 gs.base: 0x0000000000000000
bad case

fs.base: 0x0000000001007860 gs.base: 0x0000000000000000

now question is what spoils fs.base?
once again it happens AddExtention GLX
and during it is possible libglx.so calls nvidia_drv.so it calls nvidia.ko.

It is definitely problem of kernel space because segment base can not be changed in user space.
While I have no idea how to figure out what code spoils it and how.
Possibly somehow by systemtap.

As it was hinted to me by guy who not lazy see gdb source,
gdb knows about this but nobody knows that.
gdb is able to show segment base.
p/x $fs_base
and so on.
It is more simple then using core.

nvidia-bug-report.log.gz (62 KB)

Well that’s weird. Can you please run nvidia-bug-report.sh and attach nvidia-bug-report.log.gz?

Added in #8.

I used arch_prctl syscall and modified Xorg to fix thread context.

(II) Initializing extension GLX
(WW) nlz debug AddCallback fix ok=1 thread context lost right=0x7ff2263cca40 wrong=0x1c14400
*** stack smashing detected ***: /usr/libexec/Xorg.nz terminated
======= Backtrace: =========
/lib64/libc.so.6(+0x7c8dc)[0x7ff2239ab8dc]
/lib64/libc.so.6(__fortify_fail+0x37)[0x7ff223a52aa7]
/lib64/libc.so.6(__fortify_fail+0x0)[0x7ff223a52a70]
/usr/libexec/Xorg.nz[0x43ab0d]
/usr/lib64/xorg/modules/extensions/nvidia/libglx.so(+0x92d105)[0x7ff2211d1105]

Besides kernel issue of lost thread context
under suspicion buffer overrun in user mode libs.

under gdb (here more info)
(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at …/sysdeps/unix/sysv/linux/raise.c:51
#1 0x00007ffff54af4a0 in __GI_abort () at abort.c:89
#2 0x00007ffff54f38e1 in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7ffff560e70a “*** %s ***: %s terminated\n”)
at …/sysdeps/posix/libc_fatal.c:175
#3 0x00007ffff559aaa7 in __GI___fortify_fail (msg=msg@entry=0x7ffff560e6f2 “stack smashing detected”) at fortify_fail.c:30
#4 0x00007ffff559aa70 in __stack_chk_fail () at stack_chk_fail.c:28
#5 0x000000000043ab0d in AddCallback (pcbl=0x845578 , callback=0x7ffff2d18c50, data=0x0) at dixutils.c:872
#6 0x00007ffff2d19105 in ?? () from /usr/lib64/xorg/modules/extensions/nvidia/libglx.so
#7 0x00000000c1d0001d in ?? ()
#8 0x00007ffff2d14f11 in ?? () from /usr/lib64/xorg/modules/extensions/nvidia/libglx.so
#9 0x000000000082bb48 in pushToken ()
#10 0x000000000000001c in ?? ()
#11 0x0000000000000200 in ?? ()
#12 0x0000000000000000 in ?? ()

By the way I missed to point linux kernel version

uname -a
Linux nzasf 4.12.9-300.fc26.x86_64 #1 SMP Fri Aug 25 13:09:43 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Possibly last not spoiled nvidia driver is 367.27.
Because in past it worked (bumblebee+primus) on my lenovo w520.
But I upgraded kernel and did not recompile nvidia kernel part…
After some time I returned to this but 367.27 driver can not be compiled on
Linux 4.11.12-100.fc24.x86_64.

Then I download 367.57 and one made port 375.82 to 367.57 and bumped into this issue.
Then I went to desktop ASROCK Z270 fatal1ty ITX, i7 7700K, Gygabyte GTX 1050 Ti Low Profile, linux FC26
and continued investigation for 384.69.

I want to get working bumblebee+primus, native nvidia,
playing with
https://devtalk.nvidia.com/default/topic/957814/linux/prime-and-prime-synchronization/
and cuda.

It makes sense to backport kernel part update for 367.27 (kernerl >= 4.11 )
to check hypothesis, but it is long enough in time.
If will do this I will report.

It looks like nvidia-bug-report.sh didn’t capture your Xorg.8.log, probably due to this silly bug: https://devtalk.nvidia.com/default/topic/1023090/linux/nvidia-bug-report-sh-375-82-does-not-include-all-log-files-loop-variable-reused-/

Could you please attach that too?

Does Valgrind tell you anything interesting about how this corruption is occurring?

Later I will add.
I don’t think that Xorg.8.log much usefull besides it tracks what Xorg loads.
While when I will have free time I want to figure out how cpu goes on call __stack_chk_fail().
It is assembler embed after AddCallback function and next one in source. it is after ret.
It is not clear which way cpu goes here. Whether it is result of buffer overrun and ret on wrong
address in stack or … something else.

mechanism c__stack_chk_fail call

0x000000000043a950 <AddCallback+0>: push %r13
0x000000000043a952 <AddCallback+2>: push %r12
0x000000000043a954 <AddCallback+4>: mov %rsi,%r13
0x000000000043a957 <AddCallback+7>: push %rbp
0x000000000043a958 <AddCallback+8>: push %rbx
0x000000000043a959 <AddCallback+9>: mov %rdx,%r12
0x000000000043a95c <AddCallback+12>: mov %rdi,%rbx
0x000000000043a95f <AddCallback+15>: sub $0x18,%rsp
0x000000000043a963 <AddCallback+19>: mov $0x82bb48,%rbp
0x000000000043a96a <AddCallback+26>: mov %fs:0x28,%rax
0x000000000043a973 <AddCallback+35>: mov %rax,0x8(%rsp)

0x000000000043a9c9 <AddCallback+121>: mov 0x8(%rsp),%rcx
0x000000000043a9ce <AddCallback+126>: xor %fs:0x28,%rcx
0x000000000043a9d7 <AddCallback+135>: jne 0x43ab08 <AddCallback+440>
0x000000000043a9dd <AddCallback+141>: add $0x18,%rsp
0x000000000043a9e1 <AddCallback+145>: pop %rbx
0x000000000043a9e2 <AddCallback+146>: pop %rbp
0x000000000043a9e3 <AddCallback+147>: pop %r12
0x000000000043a9e5 <AddCallback+149>: pop %r13
0x000000000043a9e7 <AddCallback+151>: retq

0x000000000043ab08 <AddCallback+440>: callq 0x420640 __stack_chk_fail@plt

i.e. on entry of AddCallback %fs:0x28 is not equal the same before return.
The nature is similar to previous issue. What has been left to understand
what %fs:0x28 is and why it is changed on entry and on exit.
%fs:0x28 something what is related to stack cookie and it is protection against
changing of return adress so instead of go on wrong address on retq __stack_chk_fail
is called instead. I need see more theory on this.

c code dixutils.c
AddCallback(CallbackListPtr *pcbl, CallbackProcPtr callback, void *data)
{
unsigned long fs_base_2=-1;
int ret1=0;
int ret2=0;
if ( fs_base != -1 ) {
ret1 = syscall(158, 0x1003, & fs_base_2 );
if ( ret1 == 0 && fs_base != fs_base_2 ) {
// fix thread context which was at time InitExtensions call extension function
// and it is called Xorg back.
ret1 = syscall(158, 0x1002, fs_base );
// needs to fix stack cookie too
if ( ret2==0 ) {
asm volatile(
“mov %fs:0x28, %rcx;”
“mov %rcx, 0x8(%rsp);”
);
}
LogMessage(X_WARNING,“AddCalback fix ok=%d thread context lost right=%p wrong=%p\n”, ret1==0, fs_base, fs_base_2);
}
}
if (!pcbl)
return FALSE;
if (!pcbl) { / list hasn’t been created yet; go create it */
if (!CreateCallbackList(pcbl))
return FALSE;
}
return _AddCallback(pcbl, callback, data);
}

It seems check happens before I restore right context of fs_base.
I need do more work in order to restore thread context and fix $rsp+8 in stack.
I need add some assembler code to change stack cookie in $rsp+8
since it was taken from wrong $fs_base.

Other direction which is not investigated to end is localizing userspace stack and
called kernel function which breaks thread context.

After additional fixing stack_cookie Xorg is started.
The problem is that gbd stops response to ctrl-C.
So it is working but not working.

ps -m 10674
10674 ? - 0:00 Xorg.nz :8 -config /usr/local/etc/bumblebee/xorg.conf.nvidia -configdir /usr/local/etc/bumblebee/xorg.conf.d -sharevts -noliste
- - Ssl 0:00 -
- - Ssl 0:00 -

Xorg uses 2 additional threads but when I fixed context it is logged in Xorg conf.
And it is fixed only once. So fixing context in global sense is not a problem.

There are more problem in nvidia driver.

I tried bumblebeed and primus. Running application glxinfo.

[root@nzasf sbin]# ./bumblebeed --debug -C /usr/local/etc/bumblebee/bumblebee.conf
[566303.799089] [INFO]PM is disabled, not performing detection.
[566303.799101] [DEBUG]Active configuration:
[566303.799103] [DEBUG] bumblebeed config file: /usr/local/etc/bumblebee/bumblebee.conf
[566303.799105] [DEBUG] X display: :8
[566303.799106] [DEBUG] LD_LIBRARY_PATH: /usr/lib64/nvidia:/usr/lib/nvidia
[566303.799108] [DEBUG] Socket path: /var/run/bumblebee.socket
[566303.799109] [DEBUG] pidfile: /var/run/bumblebeed.pid
[566303.799111] [DEBUG] xorg.conf file: /usr/local/etc/bumblebee/xorg.conf.nvidia
[566303.799112] [DEBUG] xorg.conf.d dir: /usr/local/etc/bumblebee/xorg.conf.d
[566303.799114] [DEBUG] ModulePath: /usr/lib64/xorg/modules/extensions/nvidia,/usr/lib64/xorg/modules,/usr/lib64/xorg/modules/extensions,/usr/lib64/xorg/modules/input
[566303.799116] [DEBUG] GID name: bumblebee
[566303.799117] [DEBUG] Power method: none
[566303.799119] [DEBUG] Stop X on exit: 1
[566303.799120] [DEBUG] Driver: nvidia
[566303.799122] [DEBUG] Driver module: nvidia
[566303.799123] [DEBUG] Card shutdown state: 1
[566303.799206] [DEBUG]Process /sbin/modprobe started, PID 10321.
[566303.799251] [DEBUG]Hiding stderr for execution of /sbin/modprobe
[566303.800090] [DEBUG]SIGCHILD received, but wait failed with No child processes
[566303.800103] [DEBUG]Configuration test passed.
[566303.800445] [INFO]./bumblebeed 3.2.1 started
[566303.800619] [INFO]Initialization completed - now handling client requests
[566319.057734] [DEBUG]Accepted new connection
[566319.057955] [INFO]Starting X server on display :8.
[566319.058222] [DEBUG]Process Xorg.nz started, PID 10332.
[566319.059072] [DEBUG]Process with PID 10332 returned code 2
[566449.119088] [DEBUG][XORG] [566319.058507] [ERROR]Error running “Xorg.nz”: No such file or directory
[566449.219263] [ERROR]X did not start properly
[566449.219617] [DEBUG]Socket closed.
^C[566548.986746] [WARN]Received Interrupt signal.
[566548.986773] [DEBUG]Socket closed.
[566548.987285] [DEBUG]Killing all remaining processes.
[root@nzasf sbin]# ./bumblebeed --debug -C /usr/local/etc/bumblebee/bumblebee.conf
[566552.441208] [INFO]PM is disabled, not performing detection.
[566552.441248] [DEBUG]Active configuration:
[566552.441258] [DEBUG] bumblebeed config file: /usr/local/etc/bumblebee/bumblebee.conf
[566552.441269] [DEBUG] X display: :8
[566552.441277] [DEBUG] LD_LIBRARY_PATH: /usr/lib64/nvidia:/usr/lib/nvidia
[566552.441287] [DEBUG] Socket path: /var/run/bumblebee.socket
[566552.441296] [DEBUG] pidfile: /var/run/bumblebeed.pid
[566552.441306] [DEBUG] xorg.conf file: /usr/local/etc/bumblebee/xorg.conf.nvidia
[566552.441317] [DEBUG] xorg.conf.d dir: /usr/local/etc/bumblebee/xorg.conf.d
[566552.441325] [DEBUG] ModulePath: /usr/lib64/xorg/modules/extensions/nvidia,/usr/lib64/xorg/modules,/usr/lib64/xorg/modules/extensions,/usr/lib64/xorg/modules/input
[566552.441338] [DEBUG] GID name: bumblebee
[566552.441351] [DEBUG] Power method: none
[566552.441368] [DEBUG] Stop X on exit: 1
[566552.441419] [DEBUG] Driver: nvidia
[566552.441433] [DEBUG] Driver module: nvidia
[566552.441443] [DEBUG] Card shutdown state: 1
[566552.441654] [DEBUG]Process /sbin/modprobe started, PID 10668.
[566552.441759] [DEBUG]Hiding stderr for execution of /sbin/modprobe
[566552.443872] [DEBUG]SIGCHILD received, but wait failed with No child processes
[566552.443905] [DEBUG]Configuration test passed.
[566552.446262] [INFO]./bumblebeed 3.2.1 started
[566552.446416] [INFO]Initialization completed - now handling client requests
[566559.271055] [DEBUG]Accepted new connection
[566559.271284] [INFO]Starting X server on display :8.
[566559.271531] [DEBUG]Process Xorg.nz started, PID 10674.
[566690.783194] [DEBUG][XORG] X.Org X Server 1.19.3
[566690.783222] [DEBUG][XORG] Release Date: 2017-03-15
[566690.783229] [DEBUG][XORG] X Protocol Version 11, Revision 0
[566690.783239] [DEBUG][XORG] Build Operating System: nzasf 4.12.9-300.fc26.x86_64
[566690.783248] [DEBUG][XORG] Current Operating System: Linux nzasf 4.12.9-300.fc26.x86_64 #1 SMP Fri Aug 25 13:09:43 UTC 2017 x86_64
[566690.783259] [DEBUG][XORG] Kernel command line: BOOT_IMAGE=/vmlinuz-4.12.9-300.fc26.x86_64 root=UUID=4461fbcc-73f1-4905-816b-2082bea70cc6 ro rhgb quiet LANG=en_US.UTF-8 nouveau.blacklist=1 rd.driver.blacklist=nouveau nouveau.modeset=0
[566690.783269] [DEBUG][XORG] Build Date: 06 September 2017 01:10:09AM
[566690.783279] [DEBUG][XORG] Build ID: xorg-x11-server 1.19.3-4.fc26
[566690.783288] [DEBUG][XORG] Current version of pixman: 0.34.0
[566690.783297] [DEBUG][XORG] Before reporting problems, check http://wiki.x.org
[566690.783308] [DEBUG][XORG] to make sure that you have the latest version.
[566690.783322] [DEBUG][XORG] Markers: (–) probed, () from config file, (==) default setting,
[566690.783335] [DEBUG][XORG] (++) from command line, (!!) notice, (II) informational,
[566690.783348] [DEBUG][XORG] (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
[566690.783361] [DEBUG][XORG] (==) Log file: “/var/log/Xorg.8.log”, Time: Sun Sep 10 04:38:02 2017
[566690.783370] [DEBUG][XORG] (++) Using config file: “/usr/local/etc/bumblebee/xorg.conf.nvidia”
[566690.783382] [DEBUG][XORG] (++) Using config directory: “/usr/local/etc/bumblebee/xorg.conf.d”
[566690.783396] [DEBUG][XORG] (==) Using system config directory “/usr/share/X11/xorg.conf.d”
[566690.783406] [DEBUG][XORG] (==) ServerLayout “Layout0”
[566690.783418] [DEBUG][XORG] (==) No screen section available. Using defaults.
[566690.783432] [DEBUG][XORG] (
) |–>Screen “Default Screen Section” (0)
[566690.783441] [DEBUG][XORG] () | |–>Monitor “”
[566690.783449] [DEBUG][XORG] (==) No device specified for screen “Default Screen Section”.
[566690.783458] [DEBUG][XORG] Using the first device section listed.
[566690.783470] [DEBUG][XORG] (
) | |–>Device “DiscreteNvidia”
[566690.783483] [DEBUG][XORG] (==) No monitor specified for screen “Default Screen Section”.
[566690.783494] [DEBUG][XORG] Using a default monitor configuration.
[566690.783503] [DEBUG][XORG] () Option “AutoAddDevices” “false”
[566690.783511] [DEBUG][XORG] (
) Option “AutoAddGPU” “false”
[566690.783523] [DEBUG][XORG] () Option “IndirectGLX” “on”
[566690.783532] [DEBUG][XORG] (
) Not automatically adding devices
[566690.783541] [DEBUG][XORG] (==) Automatically enabling devices
[566690.783550] [DEBUG][XORG] () Not automatically adding GPU devices
[566690.783560] [DEBUG][XORG] (==) Automatically binding GPU devices
[566690.783572] [DEBUG][XORG] (==) Max clients allowed: 256, resource mask: 0x1fffff
[566690.783582] [DEBUG][XORG] (==) FontPath set to:
[566690.783593] [DEBUG][XORG] catalogue:/etc/X11/fontpath.d,
[566690.783609] [DEBUG][XORG] built-ins
[566690.783624] [DEBUG][XORG] (++) ModulePath set to “/usr/lib64/xorg/modules/extensions/nvidia,/usr/lib64/xorg/modules,/usr/lib64/xorg/modules/extensions,/usr/lib64/xorg/modules/input”
[566690.783638] [DEBUG][XORG] (==) |–>Input Device “”
[566690.783650] [DEBUG][XORG] (==) |–>Input Device “”
[566690.783662] [DEBUG][XORG] (==) The core pointer device wasn’t specified explicitly in the layout.
[566690.783674] [DEBUG][XORG] Using the default mouse configuration.
[566690.783686] [DEBUG][XORG] (==) The core keyboard device wasn’t specified explicitly in the layout.
[566690.783696] [DEBUG][XORG] Using the default keyboard configuration.
[566690.783705] [DEBUG][XORG] (II) Loader magic: 0x824e00
[566690.783716] [DEBUG][XORG] (II) Module ABI versions:
[566690.783729] [DEBUG][XORG] X.Org ANSI C Emulation: 0.4
[566690.783747] [DEBUG][XORG] X.Org Video Driver: 23.0
[566690.783762] [DEBUG][XORG] X.Org XInput driver : 24.1
[566690.783775] [DEBUG][XORG] X.Org Server Extension : 10.0
[566690.783789] [DEBUG][XORG] (–) using VT number 1
[566690.783799] [DEBUG][XORG] (II) systemd-logind: logind integration requires -keeptty and -keeptty was not provided, disabling logind integration
[566690.783809] [DEBUG][XORG] (II) xfree86: Adding drm device (/dev/dri/card1)
[566690.783831] [ERROR][XORG] (EE) /dev/dri/card1: failed to set DRM interface version 1.4: Permission denied
[566690.783841] [DEBUG][XORG] (II) xfree86: Adding drm device (/dev/dri/card0)
[566690.783853] [ERROR][XORG] (EE) /dev/dri/card0: failed to set DRM interface version 1.4: Permission denied
[566690.783864] [DEBUG][XORG] (–) PCI:*(0:1:0:0) 10de:1c82:1458:3746 rev 161, Mem @ 0xdc000000/16777216, 0xb0000000/268435456, 0xc0000000/33554432, I/O @ 0x0000e000/128, BIOS @ 0x???/524288
[566690.783876] [DEBUG][XORG] (II) LoadModule: “glx”
[566690.783886] [DEBUG][XORG] (II) Loading /usr/lib64/xorg/modules/extensions/nvidia/libglx.so
[566690.783895] [DEBUG][XORG] (II) Module glx: vendor=“NVIDIA Corporation”
[566690.783907] [DEBUG][XORG] compiled for 4.0.2, module version = 1.0.0
[566690.783924] [DEBUG][XORG] Module class: X.Org Server Extension
[566690.783936] [DEBUG][XORG] (II) NVIDIA GLX Module 384.69 Wed Aug 16 19:34:06 PDT 2017
[566690.783946] [DEBUG][XORG] (II) LoadModule: “nvidia”
[566690.783955] [DEBUG][XORG] (II) Loading /usr/lib64/xorg/modules/drivers/nvidia_drv.so
[566690.783964] [DEBUG][XORG] (II) Module nvidia: vendor=“NVIDIA Corporation”
[566690.783974] [DEBUG][XORG] compiled for 4.0.2, module version = 1.0.0
[566690.783985] [DEBUG][XORG] Module class: X.Org Video Driver
[566690.783999] [DEBUG][XORG] (II) LoadModule: “mouse”
[566690.784054] [WARN][XORG] (WW) Warning, couldn’t open module mouse
[566690.784061] [DEBUG][XORG] (II) UnloadModule: “mouse”
[566690.784072] [DEBUG][XORG] (II) Unloading mouse
[566690.784081] [ERROR][XORG] (EE) Failed to load module “mouse” (module does not exist, 0)
[566690.784090] [DEBUG][XORG] (II) LoadModule: “kbd”
[566690.784101] [DEBUG][XORG] (WW) Warning, couldn’t open module kbd
[566690.784115] [DEBUG][XORG] (II) UnloadModule: “kbd”
[566690.784125] [DEBUG][XORG] (II) Unloading kbd
[566690.784135] [DEBUG][XORG] (EE) Failed to load module “kbd” (module does not exist, 0)
[566690.784144] [DEBUG][XORG] (II) NVIDIA dlloader X Driver 384.69 Wed Aug 16 19:07:09 PDT 2017
[566690.784157] [DEBUG][XORG] (II) NVIDIA Unified Driver for all Supported NVIDIA GPUs
[566690.784170] [DEBUG][XORG] (II) Loading sub module “fb”
[566690.784180] [DEBUG][XORG] (II) LoadModule: “fb”
[566690.784191] [DEBUG][XORG] (II) Loading /usr/lib64/xorg/modules/libfb.so
[566690.784204] [DEBUG][XORG] (II) Module fb: vendor=“X.Org Foundation”
[566690.784215] [DEBUG][XORG] compiled for 1.19.3, module version = 1.0.0
[566690.784227] [DEBUG][XORG] ABI class: X.Org ANSI C Emulation, version 0.4
[566690.784242] [DEBUG][XORG] (II) Loading sub module “wfb”
[566690.784252] [DEBUG][XORG] (II) LoadModule: “wfb”
[566690.784262] [DEBUG][XORG] (II) Loading /usr/lib64/xorg/modules/libwfb.so
[566690.784274] [DEBUG][XORG] (II) Module wfb: vendor=“X.Org Foundation”
[566690.784287] [DEBUG][XORG] compiled for 1.19.3, module version = 1.0.0
[566690.784301] [DEBUG][XORG] ABI class: X.Org ANSI C Emulation, version 0.4
[566690.784310] [DEBUG][XORG] (II) Loading sub module “ramdac”
[566690.784319] [DEBUG][XORG] (II) LoadModule: “ramdac”
[566690.784328] [DEBUG][XORG] (II) Module “ramdac” already built-in
[566690.784340] [DEBUG][XORG] (II) NVIDIA(0): Creating default Display subsection in Screen section
[566690.784353] [DEBUG][XORG] “Default Screen Section” for depth/fbbpp 24/32
[566690.784367] [DEBUG][XORG] (==) NVIDIA(0): Depth 24, (==) framebuffer bpp 32
[566690.784379] [DEBUG][XORG] (==) NVIDIA(0): RGB weight 888
[566690.784391] [DEBUG][XORG] (==) NVIDIA(0): Default visual is TrueColor
[566690.784400] [DEBUG][XORG] (==) NVIDIA(0): Using gamma correction (1.0, 1.0, 1.0)
[566690.784412] [DEBUG][XORG] (
) NVIDIA(0): Option “ProbeAllGpus” “false”
[566690.784424] [DEBUG][XORG] () NVIDIA(0): Option “UseEDID” “false”
[566690.784436] [DEBUG][XORG] (
) NVIDIA(0): Option “UseDisplayDevice” “none”
[566690.784448] [DEBUG][XORG] () NVIDIA(0): Enabling 2D acceleration
[566690.784460] [DEBUG][XORG] (
) NVIDIA(0): Ignoring EDIDs
[566690.784471] [DEBUG][XORG] () NVIDIA(0): Option “UseDisplayDevice” set to “none”; enabling NoScanout
[566690.784485] [DEBUG][XORG] (
) NVIDIA(0): mode
[566690.784497] [DEBUG][XORG] (II) NVIDIA(0): NVIDIA GPU GeForce GTX 1050 Ti (GP107-A) at PCI:1:0:0 (GPU-0)
[566690.784513] [DEBUG][XORG] (–) NVIDIA(0): Memory: 4194304 kBytes
[566690.784522] [DEBUG][XORG] (–) NVIDIA(0): VideoBIOS: 86.07.39.00.8b
[566690.784531] [DEBUG][XORG] (II) NVIDIA(0): Detected PCI Express Link width: 16X
[566690.784541] [DEBUG][XORG] (II) NVIDIA(0): Validated MetaModes:
[566690.784553] [DEBUG][XORG] (II) NVIDIA(0): “NULL”
[566690.784564] [DEBUG][XORG] (II) NVIDIA(0): Virtual screen size determined to be 640 x 480
[566690.784578] [WARN][XORG] (WW) NVIDIA(0): Unable to get display device for DPI computation.
[566690.784588] [DEBUG][XORG] (==) NVIDIA(0): DPI set to (75, 75); computed from built-in default
[566690.784601] [DEBUG][XORG] (–) Depth 24 pixmap format is 32 bpp
[566690.784614] [DEBUG][XORG] (II) NVIDIA: Using 49152.00 MB of virtual memory for indirect memory
[566690.784629] [DEBUG][XORG] (II) NVIDIA: access.
[566690.784638] [DEBUG][XORG] (II) NVIDIA(0): ACPI: failed to connect to the ACPI event daemon; the daemon
[566690.784651] [DEBUG][XORG] (II) NVIDIA(0): may not be running or the “AcpidSocketPath” X
[566690.784664] [DEBUG][XORG] (II) NVIDIA(0): configuration option may not be set correctly. When the
[566690.784677] [DEBUG][XORG] (II) NVIDIA(0): ACPI event daemon is available, the NVIDIA X driver will
[566690.784690] [DEBUG][XORG] (II) NVIDIA(0): try to use it to receive ACPI event notifications. For
[566690.784702] [DEBUG][XORG] (II) NVIDIA(0): details, please see the “ConnectToAcpid” and
[566690.784714] [DEBUG][XORG] (II) NVIDIA(0): “AcpidSocketPath” X configuration options in Appendix B: X
[566690.784725] [DEBUG][XORG] (II) NVIDIA(0): Config Options in the README.
[566690.784737] [DEBUG][XORG] (II) NVIDIA(0): Setting mode “NULL”
[566690.784750] [DEBUG][XORG] (==) NVIDIA(0): Disabling shared memory pixmaps
[566690.784760] [DEBUG][XORG] (==) NVIDIA(0): Backing store enabled
[566690.784769] [DEBUG][XORG] (==) NVIDIA(0): Silken mouse enabled
[566690.784781] [DEBUG][XORG] (==) NVIDIA(0): DPMS enabled
[566690.784792] [WARN][XORG] (WW) NVIDIA(0): Option “NoLogo” is not used
[566690.784801] [DEBUG][XORG] (II) Loading sub module “dri2”
[566690.784814] [DEBUG][XORG] (II) LoadModule: “dri2”
[566690.784827] [DEBUG][XORG] (II) Module “dri2” already built-in
[566690.784837] [DEBUG][XORG] (II) NVIDIA(0): [DRI2] Setup complete
[566690.784848] [DEBUG][XORG] (II) NVIDIA(0): [DRI2] VDPAU driver: nvidia
[566690.784857] [DEBUG][XORG] (–) RandR disabled
[566690.784869] [DEBUG][XORG] (II) nlz InitExtensions
[566690.784881] [DEBUG][XORG] (II) SELinux: Disabled by boolean
[566690.784894] [DEBUG][XORG] (II) Initializing extension GLX
[566690.784908] [WARN][XORG] (WW) AddCalback fix ok=1 thread context lost right=0x7f001498ca40 wrong=0x2111470
[566690.784921] [DEBUG][XORG] (II) LoadModule: “mouse”
[566690.784933] [WARN][XORG] (WW) Warning, couldn’t open module mouse

[566690.886066] [ERROR]X unresponsive after 10 seconds - aborting
[566690.886624] [DEBUG]Socket closed.
[566690.951290] [DEBUG][XORG] (II) NVIDIA(GPU-0): Deleting GPU-0
[566690.951643] [DEBUG][XORG] (II) Server terminated successfully (0). Closing log file.
[566690.952951] [DEBUG]Process with PID 10674 returned code 0

Spoiled thread context by nvidia kernel part migth be not single problem.
Or might be too serial in order nvidia driver is not working - absolutely.

I might to switch to idea to use systemtap to catch system call on which thread context is lost.
It is possible, as I think, to get matching userspace backtrace.

Does Valgrind tell you anything interesting about how this corruption is occurring?
Estimate this idea later.

While systemtap worked I got the following

sys_*.call 1st track context of thread 12260 fs_base 7ffff7f0ea40
nvidia 1st track thread context tid=12260, fs_base=7ffff7f0ea40 module name=stap_3e7e3fb2765e4d510340e346716e7831_73_14437, fun=nv_procfs_open_params
do_arch_prctl_64 ARCH_SET_FS fun do_arch_prctl_64 option 1002 arg 8f1400
tid 12260 current fs base 7ffff7f0ea40
!!! changed thread context tid 12260 syscall SyS_arch_prctl fs_base onentry 7ffff7f0ea40 current 8f1400
do_arch_prctl_64 ARCH_SET_FS fun do_arch_prctl_64 option 1002 arg 7ffff7f0ea40
tid 12260 current fs base 8f1400

That means that fs_base was spoiled as call chain
SyS_arch_prctl()->do_arch_prctl_64()
and this happened after nv_procfs_open_params() call.

Under suspicion nvidia driver. It calls SyS_arch_prctl, changed fs_base
and missed to restore it back.

Since I failed to use systemtap further I was not able to localize kernel backtrace
and userspace backtrace. This is only hypothesis.

Since there is problem with systemtap I just instrument kernel do_arch_prctl_64

Instrumented Xorg
(WW) AddCalback fix ok=1 thread context lost right=0x7ffff7f0ea40 wrong=0x8f1410

Instrumented linux kernel

[ 2270.211522] SET_FS_BASE 00000000008f1410 old 00007ffff7f0ea40
[ 2270.211527] CPU: 3 PID: 29891 Comm: Xorg.nz Tainted: G O 4.12.9-300.fc26.nz.x86_64 #2
[ 2270.211528] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z270 Gaming-ITX/ac, BIOS P2.30 07/14/2017
[ 2270.211529] Call Trace:
[ 2270.211535] dump_stack+0x8e/0xcd
[ 2270.211538] do_arch_prctl_64+0x1ec/0x230
[ 2270.211541] SyS_arch_prctl+0x2a/0x50
[ 2270.211544] entry_SYSCALL_64_fastpath+0x1f/0xbe
[ 2270.211546] RIP: 0033:0x7ffff21e98ea
[ 2270.211548] RSP: 002b:00007fffffffdea0 EFLAGS: 00003246 ORIG_RAX: 000000000000009e
[ 2270.211550] RAX: ffffffffffffffda RBX: 00007ffff5842b38 RCX: 00007ffff21e98ea
[ 2270.211552] RDX: 0000000000000000 RSI: 00000000008f1410 RDI: 0000000000001002
[ 2270.211553] RBP: 00007ffff5842ae0 R08: 00000000008f1410 R09: 00000000000005cb
[ 2270.211554] R10: 000000000000002a R11: 0000000000003246 R12: 00007ffff5842b38
[ 2270.211556] R13: 0000000000000750 R14: 00007ffff5842b38 R15: 000000000000270e
[ 2270.211563] SET_FS_BASE 00007ffff7f0ea40 old 00000000008f1410

spoiled thread context was set in userspace.
It is hard to catch because syscall is inline function in optimized code.
nvidia kernel part is exonerated.