LightDM freezes after kernel replacement

haljarrett · October 23, 2018, 8:27pm

After applying some kernel patches and replacing the kernel image on the TX2, I am experiencing issues with lightDM. The system boots all the way to the lightdm login screen, but the display, keyboard, and mouse freeze after a few seconds. If I can login before it freezes, I am left with a stable system, showing the background and mouse but no window manager.

If left on the frozen login screen, it reboots after a few minutes.

Ctrl-Alt-F(1-6) freeze the system immediately, and do not present a terminal.
When I can get past the login screen, I am unable to get Ctrl-Alt-T to bring up a terminal.
Once it is frozen, it will restart if given the Alt-SysRq-R-E-I-S-U-B magic command.

On a separate TX2, I was left in a similar broken window manager state after patching the kernel, but was a able to get a terminal from Ctrl-Alt-T, and restore the system by nuking the compiz cache. However, that doesn’t seem to be an option in this case.

Does anyone have any thoughts or insights? Is a serial console my only option to get a terminal, if not by any other means?

linuxdev · October 23, 2018, 9:24pm

Since magic sysrq is working it implies at least some basic part of the system is still working. Exploring whatever is wrong will either require ssh to work, or else serial console.

If it does turn out that you can reach a terminal, then you should see if this command shows all “ok” for files:

sha1sum -c /etc/nv_tegra_release

haljarrett · October 24, 2018, 3:10pm

It seems I was able to get over to a virtual terminal by way of a alt-printscr-r + ctrl-alt-f1 at just the right moment during bootup - this lets me at least peak at the kernel ring, but it doens’t seem to ever get as far as a login prompt. I am seeing a number of lines along the lines of “rcu_preempt detected stalls on CPUs/tasks:”, followed by a call / task dump.

Not sure what I did to break something, but I guess that is the nature of playing with the kernel. Kernel was built using the same source version and .config from the device.

Some reading suggests that the stall may be overcome by force-nice’ing all real-time tasks (alt-printscr-n), however, the system reports failing to stop a CPU and reboots after 5 seconds.

Clearly, something is significantly broken. Perhaps by not including modules, I messed up a link somewhere - I only copied in the image / zImage.

Will try pushing the old kernel image back on with the flash tool.

linuxdev · October 24, 2018, 8:55pm

On the 64-bit L4T releases zImage is not used, only Image.

I am not sure if I am interpreting this correctly, but did you use the original “/proc/config.gz”, and then edit this to be all integrated features and no modular features? If so, then this would probably explain a failure since some features must be a module.

If “by not including modules” means the config is the same, but no modules exist in “/lib/modules/$(uname -r)/”, then this would definitely cause a failure.

On the other hand, if you built the kernel strictly based on “/proc/config.gz”, and “/lib/modules/$(uname -r)/” remains the same (along with the actual “uname -r”), then you should be able to add any modular feature desired by adding modules to that directory (the Image file itself would not even need to be replaced).

Can you give more details in exactly what was changed in config, exactly which file(s) changed, so on?

haljarrett · October 25, 2018, 1:37pm

Mostly the third one - rebuilt a kernel image to install on a working system with modules still in place and the same local_version. This was related to the kernel patches for the Intel Realsense D435 which we discussed on here: https://devtalk.nvidia.com/default/topic/1039371/auvidea-j120-and-intel-realsense-d435/#5290998

The some source was patched (USB UVC drivers) and some additional modules related to Industrial I/O were enabled, and marked to be compiled into the kernel, not as external modules.

I had success making this change and installing it with Linux 4.4.38-tegra on L4T 28.2.1 / Jetpack 3.2, but had this failure attempting to do the same with an existing system on 4.4.38-tegra on L4T 28.1 / Jetpack 3.1.

From the looks online, it seems that the 28.1 kernel version is supposed to be 4.4.15 - unclear to me why this one was on 4.4.38, but that was the uname -r and thus the source tree I built against. Perhaps that was was related to some version compatibility.

I have since moved forward using a working patched version of 28.2.1.

Thank you again!

Hal

linuxdev · October 25, 2018, 8:07pm

When changing the base kernel source (including patches and integrated features) it is possible it can invalidate various loadable modules. The existing modules might all work, or in a worst case, none of them will work. This is why I recommend in such a case to change the CONFIG_LOCALVERSION and rebuild all modules. This would probably be the next step…use the same source and configuration, but with CONFIG_LOCALVERSION changed…then build kernel and modules.

Changing a UVC driver probably would not be a problem for lightdm, but in odd circumstances it is hard to say.

R28.1 TX2 kernel is “4.4.38-tegra”. If you have 4.4.15, then I suspect you have the wrong kernel source. There is a significant chance that a difference between 4.4.15 and 4.4.38 causes some sort of device tree incompatibility if some driver ABI changed (or if the driver itself has a different provider, e.g., from NVIDIA versus from the stock 4.4.15 kernel).

haljarrett · October 26, 2018, 12:21pm

Good point, I will try that out.

I didn’t have 4.4.15 anywhere, it was just a point of confusion I had since it was listed at Linux for Tegra R28.1 | NVIDIA Developer, but I suppose that wasn’t the issue. I suspect you are right that it was a module issue.

Topic		Replies	Views
NVIDIA JETSON TX2 : Ubuntu 16.04 freezes Jetson TX2	5	1397	October 18, 2021
Device reboots after using the RT-Kernel patch [JP4.6.3] Jetson TX2 boot , nvbugs , preempt_rt	17	1717	June 14, 2023
How to boot Jetson TX2 with text mode Jetson TX2	13	5309	May 21, 2018
How to config start-up service Jetson TX2	23	3566	October 18, 2021
"sudo systemctl stop lightdm.service": Killing my unit Jetson TX2	4	3821	February 15, 2018
Building the NVIDIA Kernel Jetson TX2 kernel	35	2140	April 20, 2022
Jetpack 3.0 freezes during flash Jetson TX2	23	5133	June 28, 2017
Jetson TX2 - LI TX1 CB - J17 port not working Jetson TX2	11	1668	October 18, 2021
JETSON TK1 GUI doesn't start properly Jetson TK1	19	8099	August 12, 2015
TX1 Flashing Kernel without flashing Rootfs Jetson TX1	18	1383	October 18, 2021

LightDM freezes after kernel replacement

Related topics