JetPack 4.3 stuck at graphical login screen when used with kernel modules compiled from source

Hi,

I had to re-compile the tegra kernel to add SMB2 support (see https://devtalk.nvidia.com/default/topic/1068636/jetson-agx-xavier/please-enable-cifs_smb2-by-default-in-future-versions-of-jetpack/post/5412994/#5412994).
While this worked when compiling SMB support as built-in, I experienced some confusion and a defunct installation when trying to use the kernel modules compiled from source.

Following the steps as described in NVIDIA Tegra Linux Driver Package / Kernel Customization (https://docs.nvidia.com/jetson/archives/l4t-archived/l4t-322/index.html#page/Tegra%20Linux%20Driver%20Package%20Development%20Guide%2Fkernel_custom.html%23wwpID0E0ZC0HA) I will have a working kernel image at $TEGRA_KERNEL_OUT/arch/arm64/boot/Image that, after copied to /Linux_for_Tegra/kernel/Image and running:
./apply_binaries.sh
./flash.sh jetson-xavier mmcblk0p1
Boots and works fine.
BUT
The kernel modules remain the same (dated 10-dec-2019), even if I follow the instructions to create kernel_supplements.tbz2 and copy it to Linux_for_Tegra/kernel/kernel_supplements.tbz2.
The reason is that apply_binaries.sh (if called without -t) disregards the file kernel_supplements.tbz2 and calls Linux_for_Tegra/nv_tegra/nv-apply-debs.sh, which in turn installs kernel modules from: Linux_for_Tegra/kernel/nvidia-l4t-kernel-4.9.140-tegra-xxx.deb
The same .deb package contains a kernel image too (boot/Image), BUT that on is later overwritten by the one I copied to Linux_for_Tegra/kernel/Image.

If instead I use apply_binaries.sh -t kernel/kernel_supplements.tbz2, and flash the device with the resulting root filesystem, then it will boot, have the EULA accepted and the system configured, but then it does not allow logging in with the initial user created during the setup. The screen goes blank, but then I am back at the login screen. I can log in text mode on console 2, but could not figure out what is blocking the graphical login when using the fresh compiled kernel modules instead of the stock ones. I did not change the kernel configuration apart from enabling SMB2 and compiling it as built-in instead of module.

So I have two problems:

  1. What is confusing is that the document Kernel customization does not mention that apply_binaries.sh shall be used with the -t parameter, otherwise it will silently disregard my new kernel modules and install the pre-packaged ones

  2. What is bad is that if I manage to flash the device with the newly built kernel modules, graphical login does not work.

Did I misinterpret the instructions?
Can you suggest how to compile and install kernel modules in a way that I have a fully operational system afterwards?

Thanks,
Peter

I can’t answer with what is there, but know that the kernel itself uses both its base version and the “CONFIG_LOCALVERSION” (from config prior to build) to produce the output of the “uname -r” command. The actual search location for modules is via:

/lib/modules/$(uname -r)/

If you did not set the “uname -r” up correctly, then the kernel will search for modules in the wrong place. If “uname -r” matches another kernel, then the modules will be correct and found at the right location. During boot you would probably see a kernel version listed, and most likely that would be required to have “-tegra” appended to the version.

If you build a module and keep the original kernel, then you simply have to copy the module to the correct subdirectory of “/lib/modules/$(uname -r)/” and you are done. If you build a kernel which has the matching “uname -r”, and if all original features are still present (perhaps you added a feature), then just having the Image file in place is enough. As soon as the Image file has an altered base configuration with missing options, and/or the “uname -r” changes, you have to install everything.

The actual method of installing an Image file differs depending on some details, but the information below should be the same for all platforms:
https://devtalk.nvidia.com/default/topic/1038175/jetson-tx2/tx2i-wifi-support/post/5274619/#5274619
https://devtalk.nvidia.com/default/topic/1057246/jetson-tx1/about-kernel/post/5381591/#5381591

If your kernel has that pesky “+” being added to the end of the “uname -r”, you can remove it. See:
https://stackoverflow.com/questions/19333918/dont-add-to-linux-kernel-version

  1. OTA updates were just added and the way apply_binaries.sh works has changed as well. It looks like the documentation wasn’t updated to match or the behavior you describe is a bug. You could always uninstall the kernel apt package in a chroot (to prevent OTA updates for it) and then apply the kernel and modules yourself, using apply_binaries as a guide. Likely Nvidia will modify the script in the future to avoid this. OTA updates are brand new. It’s bound to cause some issues with older workflows.

  2. It should work. I use a script I wrote along the lines of Nvidia’s instructions and just tested it the other day. No Xavier support at the moment, however. Like linuxdev mentioned, the modules must match the kernel localversion. If you “make modules install” directly to the rootfs it should copy things where they belong.

Hard to say what exactly is wrong based on what you describe. Helpful would be more logs.

https://www.digitalocean.com/community/tutorials/how-to-use-journalctl-to-view-and-manipulate-systemd-logs

The kernel and/or any boot logs will likely hold the answer.