Xorg intermittent hangs on gentoo linux (running on nvidia shield tv)

I have encountered a problem I have not been able to solve and I need help. It is very strange because sometimes it works and sometimes it doesn’t.

The problem:

I start X and it runs up to a point then hangs. However, occasionally, it doesn’t hang and runs flawlessly. It has even happened that it hangs, then after some time, it mysteriously starts working in the same process. In another attempt, I let the hung process run overnight and it never recovered.

The symptoms:

Xorg.0.log last few lines are:

(==) NVIDIA(0): Depth 24, (==) framebuffer bpp 32
(==) NVIDIA(0): RGB weight 888
(==) NVIDIA(0): Default visual is TrueColor
(==) NVIDIA(0): Using gamma correction (1.0, 1.0, 1.0)
(**) NVIDIA(0): Enabling 2D acceleration

I ran an strace on /usr/bin/Xorg and the last few lines are:

1806 write(0, “(**) NVIDIA(0): Enabling 2D acce”…, 41) = 41
1806 mmap(NULL, 1, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f7a122000
1806 munmap(0x7f7a122000, 1) = 0
1806 ioctl(-1, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7fc9d35220) = -1 EBADF (Bad file descriptor)
1806 openat(AT_FDCWD, “/dev/nvhost-ctrl-gpu”, O_RDWR|O_CLOEXEC) = 8
1806 ioctl(8, _IOC(_IOC_READ|_IOC_WRITE, 0x47, 0x5, 0x10)

I tried to find the system call that returned “-1” producing the EBADF error without success.

I do not know what to to next. It fails far more often that it succeeds, so it is easy to reproduce.
Can anyone suggest any tips for how I can diagnose this further? What else can I try?

I am running the 3.10.96 kernel from T4L and everything else works perfectly.
I am using jetson-tx1-drivers-24.2.1 and xorg-server-1.18.4

Thanks for any help! My complete Xorg.0.log follows:

X.Org X Server 1.18.4
Release Date: 2016-07-19
[ 87.470] X Protocol Version 11, Revision 0
[ 87.470] Build Operating System: Linux 3.10.96 aarch64 Gentoo
[ 87.470] Current Operating System: Linux shieldtv 3.10.96 #1 SMP PREEMPT Thu Oct 13 05:30:55 EDT 2016 aarch64
[ 87.470] Kernel command line: fbcon=map:0 console=tty0 console=ttyS0,115200n8 console=ttyUSB0,115200n8 tegraid=21.1.1.0.0 memtype=0 vpr_resize ddr_die=1536M@2048M ddr_die=1536M@3584M section=256M usb_port_owner_info=0 lane_owner_info=0 emc_max_dvfs=1 touch_id=0@63 maxcpus=4 usbcore.old_scheme_first=1 lp0_vec=4096@0xfdfff000 nvdumper_reserved=0xfcf00000 core_edp_mv=1125 core_edp_ma=4000 power_supply=Adapter androidboot.modem=none androidboot.serialno=0423915019000000000e androidboot.security=enabled gpt android.kerneltype=normal androidboot.touch_vendor_id=0 androidboot.touch_panel_id=63 androidboot.touch_feature=0
[ 87.470] Build Date: 07 January 2018 03:08:14AM
[ 87.470]
[ 87.470] Current version of pixman: 0.34.0
[ 87.470] Before reporting problems, check http://wiki.x.org
to make sure that you have the latest version.
[ 87.470] Markers: (–) probed, () from config file, (==) default setting,
(++) from command line, (!!) notice, (II) informational,
(WW) warning, (EE) error, (NI) not implemented, (??) unknown.
[ 87.470] (==) Log file: “/var/log/Xorg.0.log”, Time: Tue Jan 9 21:16:01 2018
[ 87.486] (==) Using config file: “/etc/X11/xorg.conf”
[ 87.486] (==) Using system config directory “/usr/share/X11/xorg.conf.d”
[ 87.490] (==) No Layout section. Using the first Screen section.
[ 87.490] (==) No screen section available. Using defaults.
[ 87.490] (
) |–>Screen “Default Screen Section” (0)
[ 87.490] () | |–>Monitor “”
[ 87.490] (==) No device specified for screen “Default Screen Section”.
Using the first device section listed.
[ 87.490] (
) | |–>Device “Tegra0”
[ 87.490] (==) No monitor specified for screen “Default Screen Section”.
Using a default monitor configuration.
[ 87.490] (==) Automatically adding devices
[ 87.490] (==) Automatically enabling devices
[ 87.490] (==) Automatically adding GPU devices
[ 87.490] (==) Max clients allowed: 256, resource mask: 0x1fffff
[ 87.498] (WW) The directory “/usr/share/fonts/TTF/” does not exist.
[ 87.498] Entry deleted from font path.
[ 87.498] (WW) The directory “/usr/share/fonts/OTF/” does not exist.
[ 87.498] Entry deleted from font path.
[ 87.498] (WW) The directory “/usr/share/fonts/Type1/” does not exist.
[ 87.498] Entry deleted from font path.
[ 87.499] (WW) fonts.dir' not found (or not valid) in "/usr/share/fonts/100dpi/". [ 87.499] Entry deleted from font path. [ 87.499] (Run 'mkfontdir' on "/usr/share/fonts/100dpi/"). [ 87.499] (WW) fonts.dir’ not found (or not valid) in “/usr/share/fonts/75dpi/”.
[ 87.499] Entry deleted from font path.
[ 87.499] (Run ‘mkfontdir’ on “/usr/share/fonts/75dpi/”).
[ 87.499] (==) FontPath set to:
/usr/share/fonts/misc/
[ 87.499] (**) ModulePath set to “/usr/lib64,/usr/lib64/tegra,/usr/lib64/xorg/modules,/usr/lib64/xorg/modules/input,/usr/lib64/xorg/modules/drivers”
[ 87.499] (II) The server relies on udev to provide the list of input devices.
If no devices become available, reconfigure udev or disable AutoAddDevices.
[ 87.499] (II) Loader magic: 0x5b6ad0
[ 87.499] (II) Module ABI versions:
[ 87.499] X.Org ANSI C Emulation: 0.4
[ 87.500] X.Org Video Driver: 20.0
[ 87.500] X.Org XInput driver : 22.1
[ 87.500] X.Org Server Extension : 9.0
[ 87.500] (II) no primary bus or device found
[ 87.500] (WW) “dri” will not be loaded unless you’ve specified it to be loaded elsewhere.
[ 87.500] (II) LoadModule: “extmod”
[ 87.500] (II) Module “extmod” already built-in
[ 87.500] (II) LoadModule: “glx”
[ 89.516] (II) Loading /usr/lib64/opengl/tegra/lib/libglx.so
[ 89.780] (II) Module glx: vendor=“NVIDIA Corporation”
[ 89.780] compiled for 4.0.2, module version = 1.0.0
[ 89.780] Module class: X.Org Server Extension
[ 89.780] (II) NVIDIA GLX Module 24.2.1 Release Build (integ_stage_rel) (buildbrain@mobile-u64-1072) Wed Nov 9 19:45:00 PST 2016
[ 89.797] (II) LoadModule: “nvidia”
[ 90.536] (II) Loading /usr/lib64/xorg/modules/drivers/nvidia_drv.so
[ 90.598] (II) Module nvidia: vendor=“NVIDIA Corporation”
[ 90.598] compiled for 4.0.2, module version = 1.0.0
[ 90.598] Module class: X.Org Video Driver
[ 90.598] (II) NVIDIA dlloader X Driver 24.2.1 Release Build (integ_stage_rel) (buildbrain@mobile-u64-1072) Wed Nov 9 19:46:25 PST 2016
[ 90.598] (II) NVIDIA Unified Driver for all Supported NVIDIA GPUs
[ 90.598] (–) using VT number 7

[ 90.607] (WW) Falling back to old probe method for NVIDIA
[ 90.607] (II) Loading sub module “fb”
[ 90.607] (II) LoadModule: “fb”
[ 90.785] (II) Loading /usr/lib64/xorg/modules/libfb.so
[ 90.790] (II) Module fb: vendor=“X.Org Foundation”
[ 90.790] compiled for 1.18.4, module version = 1.0.0
[ 90.790] ABI class: X.Org ANSI C Emulation, version 0.4
[ 90.790] (II) Loading sub module “wfb”
[ 90.790] (II) LoadModule: “wfb”
[ 90.969] (II) Loading /usr/lib64/xorg/modules/libwfb.so
[ 90.975] (II) Module wfb: vendor=“X.Org Foundation”
[ 90.975] compiled for 1.18.4, module version = 1.0.0
[ 90.975] ABI class: X.Org ANSI C Emulation, version 0.4
[ 90.975] (II) Loading sub module “ramdac”
[ 90.975] (II) LoadModule: “ramdac”
[ 90.975] (II) Module “ramdac” already built-in
[ 90.975] (II) NVIDIA(0): Creating default Display subsection in Screen section
“Default Screen Section” for depth/fbbpp 24/32
[ 90.975] (==) NVIDIA(0): Depth 24, (==) framebuffer bpp 32
[ 90.975] (==) NVIDIA(0): RGB weight 888
[ 90.975] (==) NVIDIA(0): Default visual is TrueColor
[ 90.975] (==) NVIDIA(0): Using gamma correction (1.0, 1.0, 1.0)
[ 90.975] () NVIDIA(0): Option “AllowEmptyInitialConfiguration” “true”
[ 90.975] (
) NVIDIA(0): Enabling 2D acceleration

My complete xorg.conf:

Section “Files”
ModulePath “/usr/lib64”
ModulePath “/usr/lib64/tegra”
ModulePath “/usr/lib64/xorg/modules”
ModulePath “/usr/lib64/xorg/modules/input”
ModulePath “/usr/lib64/xorg/modules/drivers”

ModulePath “/usr/lib64/opengl/tegra/extensions”

EndSection

Disable extensions not useful on Tegra.

Section “Module”
Disable “dri”
SubSection “extmod”
Option “omit xfree86-dga”
EndSubSection
Load “glx”
EndSection

Section “Device”
Identifier “Tegra0”
Driver “nvidia”

Allow X server to be started even if no display devices are connected.

Option      "AllowEmptyInitialConfiguration" "true"

EndSection

Section “Monitor”
Identifier “DSI-0”
Option “Ignore”
EndSection

Section “InputClass”
Identifier “keyboard-all”
Option “XkbOptions” “terminate:ctrl_alt_bksp”
EndSection

I have new information…

I have discovered that the xorg server hangs (at the point described above) for exactly 13 minutes (780 seconds) every time, and then proceeds without further problems. I was just not waiting long enough!
(Apparently, when I let it run overnight, I am either mistaken or it was something else.) I have now repeated the process 6 times and exactly at 780 seconds, Xorg recovers with no messages in the log file.
Even -logverbose 9 says nothing. I will post the rather long Xorg.0.log if requested.

Is there some kind of CUDA (or other resource) timeout which Xorg could be waiting for?

Can anyone suggest how to find it?

Thanks for any help or ideas!