Frequent X-server crashing

Hi,

The X-server on my Jetson crashes frequently during visually demanding operations (like playing a movie or rendering a complex web page in Firefox). Here is the log output:

[  5019.317] dix: invalid event type 3
[  5019.319] 03 00 00 00 00 00 00 00 
[  5019.322] a8 01 00 00 ac 96 4c 00 
[  5019.325] 07 00 00 00 07 00 00 00 
[  5019.330] 00 00 00 00 00 00 00 00 
[  5019.333] (EE) 
[  5019.334] (EE) Backtrace:
[  5019.336] (EE) 
[  5019.338] (EE) 
Fatal server error:
[  5019.338] (EE) Wrong event type 3. Aborting server
[  5019.342] (EE) 
[  5019.343] (EE) 
Please consult the The X.Org Foundation support 
	 at http://wiki.x.org
 for help. 
[  5019.344] (EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information.
[  5019.346] (EE) 
[  5019.854] (EE) Server terminated with error (1). Closing log file.

I cannot find anything online about this error message. Is that a hardware or software problem? Does anyone else also experience this?

The system itself stays stable, only the X session is killed and the login screen appears again.

Hello anlumo,

Would it be possible to provide the complete output of the Xorg.0.log file after this issue occurs again? Alternatively, you could send us an email at linux-tegra-bugs@nvidia.com and attach the log file.

Many Thanks,

-Chris

If a mail is sent to linux-tegra-bugs@nvidia.com and a solution or a workaround is found, do remember to update the original post here so that others can also see it.

Hi,

I’m also having X crashes when the desktop envinment (now is XFCE, before was Unity) is in heavy load, e.g., opening a lot of windows.

I’m able to continue working by ssh. I can see that the X processes continues running and consuming 100% of one of the CPU cores. The /var/log/Xorg.0.log shows nothing at the crash time.

The dmesg shows some errors I cannot interpret. An output of dmesg (called with “dmesg -xT”) is here: http://pastebin.com/TkD0q8va . The crash happens at time 09:55:10.

Well I’ve just found out that the error I’m getting is the “GK20A_FIFO_HANDLE_MMU_FAULT” already noted in the “Known Issues” section of the “TEGRA LINUX DRIVER PACKAGE R19.3 Release Notes” document.
However, in my case the crash have nothing to due with video playback.

Just an observation. “vgaarb: this pci device is not a vga device” always occurs when the keyboard/mouse are not used for a while, and the screen goes blank (probably energy saving). Was it long enough without touching keyboard and mouse that the screen may have blanked? If so, try moving the mouse slightly or hitting a key to see if idle blanking is doing it.

Here is a similar instance of the X server crashing while using Unity.

I can reproduce this crash every time. Here’s the set-up when I crash the X server which forces the login screen to come up.

Plugged into the USB A is a 4 port active USB 3.0 hub, manufactured by Targus.

The keyboard is a Logitech K120.

The mouse is what makes mine crash.

Plug in a wireless Logitech mouse M215 and receiver, the cursor takes a few seconds to wake up if it stops moving, which is annoying, but it doesn’t crash the server.

Whether I change the mouse to a Logitech G400 when the power is off or during an X session, if I let thr mouse sit for a second then move it again, the X server immediately crashes and sends me back to the lightdm login screen.

This is the Xorg log message for the crash. I have to read it from the Xorg.0.log.old and Xorg.1.log.old immediately after logging back in.

…normal xorg log sh*t…
[327728.682] () Logitech Gaming Mouse G400: always reports core events
[327728.682] (
) evdev: Logitech Gaming Mouse G400: Device “/dev/input/event4”
“” (–) : Vendor 0x46d Product 0xc245
“” (–) : Found 12 mouse buttons
“” (–) : Found scroll wheel(s)
“” (–) : Found relative axes
“” (–) : Found x and y relative axes
“” (II) : Configuring as mouse
“” (II) : Adding scrollwheel support
“” () : YAxisMapping: buttons 4 and 5
“” (
) : EmulateWheelButton: 4, EmulateWheelInertia: 10, EmulateWheelTimeout: 200
[327728.683] () Option “config_info” “udev:/sys/devices/platform/tegra-ehci.2/usb2/2-1/2-1.4:1.0/input/input6/event4”
“” (II) XINPUT: Adding extended input device “Logitech Gaming Mouse G400” (type: MOUSE, id 9)
[327728.686] (II) evdev: initialized for relative axes.
[327728.692] (
) keeping acceleration scheme 1
“” acceleration profile 0
“” acceleration factor: 2.000
“” acceleration threshold: 4
[327808.116] dix: invalid event type 2 <- the crash.
[…117] 02 00 00 00 00 00 00 00
[…120] a8 01 00 00 70 f4 89 13
[…123] 09 00 00 00 09 00 00 00
[…125] 00 00 00 00 00 00 00 00
[…128] (EE)
[…129] (EE) Backtrace:
[…131] (EE)
[…131] (EE)
Fatal server error:
[…133] (EE) Wrong event type 2. Aborting server
[…134] (EE)
[…135] (EE)
Please consult the Xorg foundation for help…

It looks like theres’s a problem with some dix crashing the server.

It shouldn’t matter, but you might want to try with a USB 2 hub if you have one available.

Yes, thanks. However, I guess it has nothing to due with the freeze, as those messages came many minutes or seconds before the crash. And in my case the messages appear due to Synergy being used.

FYI, the problem has never appeared since I flashed L4T 19.3 into my Jetson.

I was having a similar problem with the X server freezing completely forcing a shutdown from ssh or hard reset. I flashed to the latest ver 21.3 and now instead of freezing it will log out to the log in screen. My Xorg.0.log.old file is as follows.

[  9759.496] dix: invalid event type 3
[  9759.499] 03 00 00 00 00 00 00 00 
[  9759.500] a8 01 00 00 07 eb 94 00 
[  9759.501] 09 00 00 00 09 00 00 00 
[  9759.502] 00 00 00 00 00 00 00 00 
[  9759.502] (EE) 
[  9759.502] (EE) Backtrace:
[  9759.502] (EE) 
[  9759.503] (EE) 
Fatal server error:
[  9759.503] (EE) Wrong event type 3. Aborting server
[  9759.503] (EE) 
[  9759.503] (EE) 
Please consult the The X.Org Foundation support 
	 at http://wiki.x.org
 for help. 
[  9759.503] (EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information.
[  9759.503] (EE) 
[  9759.712] (EE) Server terminated with error (1). Closing log file.

It seems to do it at random and I haven’t been able to isolate the cause and reproduce the crash. Sometimes it will do it frequently and sometimes just a touch of the mouse will cause it. I have had an uptime of over 12 hours without one crash and sometimes less then an hour for several.

Sometimes I suspect such issues can be a side-effect of peripherals switching in and out of different modes, e.g., low power modes. Even though you are not able to isolate the cause, does it seem that peripherals like keyboard and mouse are involved? Does it do this when no peripheral is being touched? The barely touched mouse makes this stand out. Are the peripherals USB (which is actually kind of a silly question since there is no PS2 port)?

You will find a performance settings article here which includes (among other things) disabling USB suspend:
http://elinux.org/Jetson/Performance

# Disable USB auto-suspend, since it disconnects some devices such as webcams on Jetson TK1.
echo -1 > /sys/module/usbcore/parameters/autosuspend

Whenever an X11 event is involved it helps to start by simplifying I/O devices like mice and keyboards. In this case you could make sure that USB (which communicates those events) is not part of the complication. A powered USB HUB also reduces complications because the Jetson would no longer be required to deal with this.

Thank for your prompt reply. The problem before I updated the flash was obvious it had something to do with the Graphics. This new problem is a little harder to troubleshoot. I am leaning towards either some USB Input devices and/or Sound causing the crash.

I have a powered 7 port 2.0 USB Hub plugged into the 3.0 Type A socket, with one 3.0 USB Hard Drive, Laser Wheel Mouse, and Keyboard plugged into it. I have ordered a micro B to female A adapter and plan to transfer my Hub to it as well. I want to dedicate the 3.0 USB Hard drive to the 3.0 socket. I hope that will improve stability as well.

I do believe it has something to do with some kind of power saving issue like you said. One thing for certain is that it only happens when I am active with the machine, never when idle. I have tried your link and it seems it is still crashing but similar to Bronson above.

[    34.335] (II) config/udev: Adding input device USB Laser Wheel Mouse (/dev/input/event2)
[    34.335] (**) USB Laser Wheel Mouse: Applying InputClass "evdev pointer catchall"
[    34.335] (II) Using input driver 'evdev' for 'USB Laser Wheel Mouse'
[    34.335] (**) USB Laser Wheel Mouse: always reports core events
[    34.335] (**) evdev: USB Laser Wheel Mouse: Device: "/dev/input/event2"
[    34.335] (--) evdev: USB Laser Wheel Mouse: Vendor 0x1bcf Product 0xa
[    34.335] (--) evdev: USB Laser Wheel Mouse: Found 9 mouse buttons
[    34.335] (--) evdev: USB Laser Wheel Mouse: Found scroll wheel(s)
[    34.335] (--) evdev: USB Laser Wheel Mouse: Found relative axes
[    34.335] (--) evdev: USB Laser Wheel Mouse: Found x and y relative axes
[    34.335] (--) evdev: USB Laser Wheel Mouse: Found absolute axes
[    34.335] (II) evdev: USB Laser Wheel Mouse: Forcing absolute x/y axes to exist.
[    34.335] (II) evdev: USB Laser Wheel Mouse: Configuring as mouse
[    34.335] (II) evdev: USB Laser Wheel Mouse: Adding scrollwheel support
[    34.335] (**) evdev: USB Laser Wheel Mouse: YAxisMapping: buttons 4 and 5
[    34.335] (**) evdev: USB Laser Wheel Mouse: EmulateWheelButton: 4, EmulateWheelInertia: 10, EmulateWheelTimeout: 200
[    34.335] (**) Option "config_info" "udev:/sys/devices/platform/tegra-ehci.2/usb2/2-1/2-1.6/2-1.6:1.0/input/input2/event2"
[    34.336] (II) XINPUT: Adding extended input device "USB Laser Wheel Mouse" (type: MOUSE, id 9)
[    34.336] (II) evdev: USB Laser Wheel Mouse: initialized for relative axes.
[    34.336] (WW) evdev: USB Laser Wheel Mouse: ignoring absolute axes.
[    34.336] (**) USB Laser Wheel Mouse: (accel) keeping acceleration scheme 1
[    34.336] (**) USB Laser Wheel Mouse: (accel) acceleration profile 0
[    34.336] (**) USB Laser Wheel Mouse: (accel) acceleration factor: 2.000
[    34.336] (**) USB Laser Wheel Mouse: (accel) acceleration threshold: 4
[    34.337] (II) config/udev: Adding input device tegra-rt5639 Headphone Jack (/dev/input/event3)
[    34.337] (II) No input driver specified, ignoring this device.
[    34.337] (II) This device may have been added with another device file.
[    53.185] (II) XKB: reuse xkmfile /var/lib/xkb/server-B20D7FC79C7F597315E3E501AEF10E0D866E8E92.xkm
[  1278.999] nvLock: client timed out, taking the lock
[  1338.199] nvLock: client timed out, taking the lock
[  1416.741] nvLock: client timed out, taking the lock
[  1434.433] nvLock: client timed out, taking the lock
[  1467.312] nvLock: client timed out, taking the lock
[  1886.534] dix: invalid event type 2
[  1886.537] 02 00 00 00 00 00 00 00 
[  1886.538] 00 00 00 00 00 00 00 00 
[  1886.538] 00 00 00 00 00 00 00 00 
[  1886.538] 00 00 00 00 00 00 00 00 
[  1886.539] (EE) 
[  1886.539] (EE) Backtrace:
[  1886.539] (EE) 
[  1886.539] (EE) 
Fatal server error:
[  1886.539] (EE) Wrong event type 2. Aborting server
[  1886.539] (EE) 
[  1886.539] (EE) 
Please consult the The X.Org Foundation support 
	 at http://wiki.x.org
 for help. 
[  1886.539] (EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information.
[  1886.540] (EE) 
[  1886.754] (EE) Server terminated with error (1). Closing log file.

However, this time I am going to disable sound, and see it the problem repeats. The events seem to be more prevalent when I am listening to music, either standalone or web players.

I’ve noticed that between the first and second log it went from unknown type 3 event to unknown type 2 event. In each case it is related to an event which in turn says it is related to events and not graphics, nor to memory issues (including both human memory programming errors and hardware memory errors). The handling of HID (human interface device) hardware is almost certainly the main component of the failure. This in turn could be caused by the USB communications, or the drivers to the individual components, or from the X server simply not being able to handle some modern event. I tend to lean to the side that the handling from the HID devices or the USB communications somehow corrupt or send an unusual event (or perhaps a usual event with an unusual timing).

It is also possible audio or multimedia is indirectly related, for example, headphones often have a volume control on the cord for increasing/decreasing volume. Sometimes software will generate an event in response to something, e.g., the “bell” or “alert” character might play through stdout of a terminal, but have something in X11 react to it (visual bell).

It is unlikely that the USB hard drive has any influence on events, but it could certainly have an influence on USB drivers which brings out some fringe case in USB or the other devices connected to it. FYI, it might be useful to test something the reverse of what you are thinking of for the drive: Just temporarily try without the hard drive if you can, or with the drive connected to the micro connector…just to use the same USB on HID devices while reducing USB load for that port. Then again with the HID items on the micro connector, but USB drive on the full size connector.

FYI, that micro connector is neither type A nor type B, it is “on the go”, OTG (micro A/B). It’s behavior as a device versus host changes depending on whether a micro A or micro B is connected. For this to work with keyboard, mouse, or other standard items, it must be a host (keyboard, mouse, and other similar items are devices). Thus your connector/adapter must be micro A, not micro B. The supplied cable is micro B which is strictly for recovery/flash mode when Jetson becomes a device instead of a host. If the cable you ordered really was micro B it won’t work with your HUB and keyboard/mouse.

EDIT/Additional note: Plugging in or removing a microphone or headphone (non-USB) is also an event generator.

We recently found kernel regression causing incorrect VFP states after returning from signal handler. This regression was causing neon accelerated memcpy to incorrectly copy data when interrupted by signal handler. As X uses signal handler for receiving input events and glibc memcpy is neon accelerated, we were observing random crashes in X upon input events. The change which introduced the regression is - http://nv-tegra.nvidia.com/gitweb/?p=linux-3.10.git;a=commit;h=e700ffc891047182e16f53fc0238c8fd9bf72007 . We are in process of submitting a revert for this change to nv-tegra. Meanwhile you can verify if this issue is fixed by revering the mentioned change or applying revert patch below -

Subject: [PATCH 1/2] Revert "DROP! Revert "ARM: 7419/1: vfp: fix VFP flushing
 regression on sigreturn path""

Reverting because this change causes __memcpy_neon() to corrupt if
interrupted by signal handler.

---
 arch/arm/vfp/vfpmodule.c | 14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c
index da9cc26..41c9a3b 100644
--- a/arch/arm/vfp/vfpmodule.c
+++ b/arch/arm/vfp/vfpmodule.c
@@ -589,12 +589,6 @@ int vfp_preserve_user_clear_hwstate(struct user_vfp __user *ufp,
 	 * entry.
 	 */
 	hwstate->fpscr &= ~(FPSCR_LENGTH_MASK | FPSCR_STRIDE_MASK);
-
-	/*
-	 * Disable VFP in the hwstate so that we can detect if it gets
-	 * used.
-	 */
-	hwstate->fpexc &= ~FPEXC_EN;
 	return 0;
 }
 
@@ -607,12 +601,8 @@ int vfp_restore_user_hwstate(struct user_vfp __user *ufp,
 	unsigned long fpexc;
 	int err = 0;
 
-	/*
-	 * If VFP has been used, then disable it to avoid corrupting
-	 * the new thread state.
-	 */
-	if (hwstate->fpexc & FPEXC_EN)
-		vfp_flush_hwstate(thread);
+	/* Disable VFP to avoid corrupting the new thread state. */
+	vfp_flush_hwstate(thread);
 
 	/*
 	 * Copy the floating point registers. There can be unused
-- 
1.8.1.5

I have recently re-flashed the unit and started from scratch. I have only had one recurrence since. However I found out that /etc/rc.local commands weren’t working so that through me off a little. Now I submit the

echo -1 > /sys/module/usbcore/parameters/autosuspend

as soon as I am booted and it seems to be stable. I also am running on performance with all cores active. If the problem persists I will recompile the kernel with the changes you indicated, but for now this seems to be an acceptable fix. I hope I haven’t spoke to soon and it continues, but I will keep this thread notified. I made a simple bash file to run after booted that file is below.

#!/bin/bash
echo "Turning off Autosuspend on USB devices"
echo -1 > /sys/module/usbcore/parameters/autosuspend

echo "Turning Off CPU Auto Power Control"
echo 0 > /sys/devices/system/cpu/cpuquiet/tegra_cpuquiet/enable

echo "Turning On all CPU's"
echo 1 > /sys/devices/system/cpu/cpu1/online
echo 1 > /sys/devices/system/cpu/cpu2/online
echo 1 > /sys/devices/system/cpu/cpu3/online

echo "Enabling Performance Optimization"
echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

echo "CPU's Active"
cat /sys/devices/system/cpu/online

echo "CPU 0 Frequency"
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq

echo "CPU 1 Frequency"
cat /sys/devices/system/cpu/cpu1/cpufreq/scaling_cur_freq

echo "CPU 2 Frequency"
cat /sys/devices/system/cpu/cpu2/cpufreq/scaling_cur_freq

echo "CPU 3 Frequency"
cat /sys/devices/system/cpu/cpu3/cpufreq/scaling_cur_freq

Thank you yoku and linuxdev.

I noticed the same issue and I figured out that you can solve it simply calling a script instead of calling the direct command.

If you put your script in /usr/local/bin, you give to it the executable right and add it to “rc.local” you will see that everything will work as expected.

The problem is that I do not know why in this way it works :)

Thank you for your reply. I have still been having troubles with it crashing from time to time. Recently more than usual(I have been putting higher video intensive loads on the unit lately). I stumbled onto a fix for enabling USB 3.0 port. It required updating the extlinux.conf file in the /boot/extlinux folder my original file was as follows:

TIMEOUT 30
DEFAULT primary

MENU TITLE Jetson-TK1 eMMC boot options

LABEL primary
      MENU LABEL primary kernel
      LINUX /boot/zImage
      FDT /boot/tegra124-jetson_tk1-pm375-000-c00-00.dtb
      APPEND console=ttyS0,115200n8 console=tty1 no_console_suspend=1 lp0_vec=2064@0xf46ff000 mem=2015M@2048M memtype=255 ddr_die=2048M@2048M section=256M pmuboard=0x0177:0x0000:0x02:0x43:0x00 tsec=32M@3913M otf_key=c75e5bb91eb3bd947560357b64422f85 usbcore.old_scheme_first=1 core_edp_mv=1150 core_edp_ma=4000 tegraid=40.1.1.0.0 debug_uartport=lsport,3 power_supply=Adapter audio_codec=rt5640 modem_id=0 android.kerneltype=normal fbcon=map:1 commchip_id=0 usb_port_owner_info=0 lane_owner_info=6 emc_max_dvfs=0 touch_id=0@0 board_info=0x0177:0x0000:0x02:0x43:0x00 root=/dev/mmcblk0p1 rw rootwait tegraboot=sdmmc gpt

The updated code was as follows:

TIMEOUT 30
DEFAULT primary

MENU TITLE Jetson-TK1 eMMC boot options

LABEL primary
      MENU LABEL primary kernel
      LINUX /boot/zImage
      FDT /boot/tegra124-jetson_tk1-pm375-000-c00-00.dtb
      APPEND console=ttyS0,115200n8 console=tty1 no_console_suspend=1 lp0_vec=2064@0xf46ff000 video=tegrafb mem=1862M@2048M memtype=255 ddr_die=2048M@2048M section=256M pmuboard=0x0177:0x0000:0x02:0x43:0x00 vpr=151M@3945M tsec=32M@3913M otf_key=c75e5bb91eb3bd947560357b64422f85 usbcore.old_scheme_first=1 core_edp_mv=1150 core_edp_ma=4000 tegraid=40.1.1.0.0 debug_uartport=lsport,3 power_supply=Adapter audio_codec=rt5640 modem_id=0 android.kerneltype=normal usb_port_owner_info=0 fbcon=map:1 commchip_id=0 usb_port_owner_info=2 lane_owner_info=6 emc_max_dvfs=0 touch_id=0@0 tegra_fbmem=32899072@0xad012000 board_info=0x0177:0x0000:0x02:0x43:0x00 root=/dev/mmcblk0p1 rw rootwait tegraboot=sdmmc gpt

The updated code adds video=tegrafb and I believe assigns memory to it. Ironically when I saw the difference it makes sense that the X Server would crash when there would be some kind of memory fault of some sort. I hoped that it would make the difference. I have not had a recurrence in over a week now. The system seems to be stable. But now I do not even need to put the system into performance mode as it does not crash anymore(I hope).

I noticed that both “mem=” and “tegrafb_mem=” entries differ. By what means did you figure the amount to reduce mem, and add to tegra_fbmem?

[/quote]

I noticed that both “mem=” and “tegrafb_mem=” entries differ. By what means did you figure the amount to reduce mem, and add to tegra_fbmem?

[/quote]

I stumbled onto the difference when I was looking for a fix to switch on the USB 3.0 functionality. I found this link http://jetsonhacks.com/2014/11/08/jetson-tk1-linux-tegra-l4t-21-1-enable-usb-3-0/ . In the video I noticed the difference from his/her extlinux file and mine. After looking closer at the differences I hoped that maybe this might actually fix my main problem as it seemed to assign memory to what I believe is the video frame buffer and my main problem is related to video memory. I waited a couple weeks to see if I had a recurrence and then updated this sight. Nothing more then just blind luck.