Network driver error when recompiling the kernel on JetPack 6.2

Hello,

I tried to rebuild the kernel to enable additional modules.
(vhost, vhost_net, vhost_vsock, xfrm_user).

After finishing the build and installation, I rebooted into the new kernel,
but the network interface did not work.

I found that the original kernel was using the r8168 module,
but the new kernel had no network driver loaded.

Upon checking the kernel configuration,
I discovered that Realtek 8169/8168/8101/8125 Ethernet support was disabled by default.

After enabling that option, rebuilding, installing, and rebooting, the network interface worked properly. r8169 module is automatically loaded.

What is the root cause of this issue?
Are the downloaded BSP sources not exactly the same as the original ones?

Below are the commands I used:

wget https://developer.nvidia.com/downloads/embedded/l4t/r36_release_v4.3/sources/public_sources.tbz2
tar -xf public_sources.tbz2
cd Linux_for_Tegra/source/
sudo tar -xf kernel_src.tbz2 -C /usr/src/

cd /usr/src/kernel/kernel-jammy-src
sudo zcat /proc/config.gz | sudo tee .config > /dev/null
sudo bash scripts/config --file .config --set-str LOCALVERSION "-custom"

sudo make menuconfig

sudo make -j$(nproc) Image modules dtbs
sudo make modules_install
sudo depmod 5.15.148-custom

sudo mv /boot/Image /boot/Image.backup
sudo cp arch/arm64/boot/Image /boot/Image

sudo vim /boot/extlinux/extlinux.conf
TIMEOUT 30
DEFAULT primary

MENU TITLE L4T boot options

# This is custom
LABEL primary
      MENU LABEL primary kernel
      LINUX /boot/Image
      INITRD /boot/initrd
      APPEND ${cbootargs} root=/dev/mmcblk0p1 rw rootwait rootfstype=ext4 mminit_loglevel=4 console=ttyTCU0,115200 firmware_class.path=/etc/firmware fbcon=map:0 nospectre_bhb video=efifb:off console=tty0 

# This is original 
LABEL backup
      MENU LABEL backup kernel
      LINUX /boot/Image.backup
      INITRD /boot/initrd
      APPEND ${cbootargs} root=/dev/mmcblk0p1 rw rootwait rootfstype=ext4 mminit_loglevel=4 console=ttyTCU0,115200 firmware_class.path=/etc/firmware fbcon=map:0 nospectre_bhb video=efifb:off console=tty0```

Which L4T release did you use? See “head -n 1 /etc/nv_tegra_release”.

In the L4T R35.x series (and before) NVIDIA was providing the source with out-of-tree content. The default configuration make target was tegra_defconfig. Starting in L4T R36.x the kernel is mainline, and the default make target becomes defconfig. NVIDIA created the tegra_defconfig target, but the mainline defconfig is maintained by kernel.org. They differ.

Did you switch from an L4T R35.x kernel to an L4T R36.x kernel? Mostly your procedure looks correct, so I am assuming it was an issue of the mainline kernel’s configuration needing changes.

I’m using Jetpack 6.2, which includes L4T 36.4.3 by default and I’ve downloaded same one.

head -n 1 /etc/nv_tegra_release
# R36 (release), REVISION: 4.3, GCID: 38968081, BOARD: generic, EABI: aarch64, DATE: Wed Jan  8 01:49:37 UTC 2025

If I understand correctly, starting from R36.x, out-of-tree drivers are no longer included in kernel_src.tbz2 by default?
In fact, besides the networking issues, I also encountered problems with display resolution.

There are several archives under Linux_for_Tegra/source:

kernel_src.tbz2 and
nvidia_kernel_display_driver_source.tbz2, kernel_oot_modules_src.tbz2, etc.
Do these packages include out-of-tree drivers?

For example, after extracting Linux_for_Tegra/source/kernel_oot_modules_src.tbz2, I checked the contents:

cd nvidia-oot/drivers/net/ethernet/realtek/r8168
ls >
Makefile  r8168_asf.c  r8168_asf.h  r8168_dash.h  r8168_dummy.c  r8168_fiber.h  r8168_firmware.c  r8168_firmware.h  r

Ultimately, I want to rebuild the kernel to inclue only the additional modules I need, and ensure the system boots properly with everything working.

Could you please provide the correct steps or commands to achieve this?
I’m not fully confident in my understanding of the kernel compilation process.

A shorter answer would be that both sources should be the same, but that the configuration used during build may have differed from the default defconfig target (R36.x and newer) or tegra_defconfig target (R35.x and earlier). A more detailed answer follows.

Let’s talk about two different NVIDIA content additions…

First, if one were installing the flash software manually (without JetPack/SDK Manager), then the “driver package” is the part which is the actual flash software. In recovery mode the Jetson is a custom USB device requiring a custom driver, and this is the driver. This creates the “Linux_for_Tegra/” subdirectory and almost all of its content when it is unpacked.

Next there is the sample root filesystem (rootfs). This unpacks into “Linux_for_Tegra/rootfs/”, and this is purely Ubuntu without NVIDIA content. From “Linux_for_Tegra/” the command “sudo ./apply_binaries.sh” is run, and this is what what populates NVIDIA content into the sample rootfs and transforms it from being called Ubuntu to instead being called “Linux for Tegra” (L4T).

Some content arrives via this latter apply_binaries.sh.

NVIDIA could include more content, but mix of what is provided is partly from apply_binaries.sh and partly from kernel content. Before L4T R36.x there was quite a bit of significant content from the out-of-tree content. The Realtek driver support was never part of the out-of-tree content, but the configuration of the kernel build was from NVIDIA. That configuration (the tegra_defconfig target prior to L4T R36.x, and no just defconfig with mainline R36.x) is no longer from NVIDIA.

When NVIDIA put together its initial software which it ships with it probably did include a customization of the defconfig. I have not looked, but if you still have the exact original Image available to boot, then you could boot to that, and then literally ask the kernel what config it was built with. This command shows the build config (well, it is missing the CONFIG_LOCALVERSION, but otherwise this is exact):
zcat /proc/config.gz | less -i
(you could then use “/realtek<enter>” or “/rtl<enter>” or “/8169<enter>” to search for those)

You could also just use cp to copy that file somewhere else (it isn’t a real file, it lives in RAM and is the kernel pretending to be a file with a list of its build time configuration), and then examine that.

I also have to caution you that the initrd can also cause strangeness if you don’t understand it. When you change the actual kernel, and not just modules, then the initrd (when used; an initrd not always used) also needs to be updated.

The changes you’ve made probably demand not just the new kernel Image file be used, but 100% of all modules. The initrd is an adapter between boot stages and the kernel mounting the “/” filesystem (the root filesystem, or rootfs). The initrd is itself a very simple filesystem in RAM which eventually vanishes, but any modules required for reaching the “pivot root” step which discards the initrd rootfs and switches to the real “/” rootfs might imply there are modules inside of the initrd itself. The initrd contains a small subset of the modules which need loading to reach mount of “/”. Your extlinux.conf is nice in the sense that it has the old kernel still available, but you also need to create a second initrd; that initrd would need to contain the same modules that the original contains, but built against the new kernel. Any time a kernel changes in any way other than just modules it implies the possibility that the previous modules will not load. There is a binary interface, and the address changes might make this fail to load when the kernel itself changes.

However, if no modules in the initrd are required for completing boot to the point of mounting “/”, then the initrd won’t cause any problems even if the modules don’t load. It is possible though, I do not know about what is working or failing.

Taking the explanation a bit further, when a configuration item is included with the “=y”, then this means the driver is integrated into the base kernel. When configured with the “=m”, then this creates a module and is not part of the base kernel. The feature will be one of “integrated” or “modular”. Changing modular features is more or less without consequence and quite simple. Once you change the integrated features you risk invalidating the initrd and all modules (it isn’t a guarantee of invalidating everything, but it does mean you cannot assume otherwise, and it is then time to rebuild all modules; this in turn means you might also need a new initrd.

Now if you started with a configuration which 100% exactly matches the original kernel, including CONFIG_LOCALVERSION, and then all you did was to add or remove modules, it would result in not needing a new Image at all. The only requirement would be copying the module in to the right place and running “sudo depmod -a” or rebooting. See the suffix of the output from “uname -r”; the NVIDIA kernel has CONFIG_LOCALVERSION set to “-tegra”, and this becomes part of the module load path (this is the part most people forget to adjust if they intend to only add modules).

You’ve changed the “uname -r” to have the prefix “-custom”, and this is good if you are intending to replace the Image file itself. However, if you were just adding modules, then there was no need for a new Image.

The original question though is about why those drivers were not present. I will suggest you check the driver symbols inside of the “/proc/config.gz” for those specific kernels; boot to one, save that (named after the “uname -r” probably), and do that again in the new kernel. Compare.

Thank you for your detailed explanation!

Initially, I tried setting CONFIG_LOCALVERSION to -tegra, which is the same as the default. I copied the original kernel configuration and only modified it to add new modules (i.e., setting them to =m). Then I ran make modules, make modules_install, depmod -a, and rebooted.

However, when I tried to load the new modules (e.g., modprobe vhost), I encountered errors such as “unknown symbol” or “disagrees about version of symbol…”. I checked various details using modinfo and other tools, but I couldn’t identify the root cause.

(The original source and the source used for the build are exactly the same version.)

So, I also tried building a new Image, as I mentioned in my original question. With that new image, modprobe worked — but unfortunately, the network and display were broken.

Anyway… I’ll have to try again.

Symbol issues are not limited to actual source code differing. Simply having a different integrated feature (“=y”) can do this. If you overwrote the original modules, and did not just add the new modules, then you would have problems for a change in “=y” configs. Also, if you performed a “make modules” without building the “Image” target, then you would have had to have manually propagated the configuration via the “make modules_prepare”.

Another problem we have not mentioned: How did you add the new modules? You would normally start via setting up a kernel with the defconfig (L4T R36.x+), and then use a configuration tool (don’t forget CONFIG_LOCALVERSION as well). The tool could be “make menuconfig”, but I prefer “make nconfig” (the two are the same except that nconfig has a symbol search function). The editors have knowledge of dependencies, and enabling any symbol can cause others to be required (or on rare occasions, to conflict). Now about unknown symbols…

If you did not use a dependency-aware editor to add the modules, this is one reason why symbols would be an issue. Consider that “disagrees about version of symbol” implies the symbol is there, the kernel the module was compiled against was not a match (again, the integrated “=y” features or the CONFIG_LOCALVERSION could cause this, it isn’t just about kernel source code release version). If a module was required due to inserting another module, then that module won’t care if the reason is due to a missing kernel symbol or a symbol which cannot be loaded. I suggest putting the original kernel and modules back in place; the backup in extlinux.conf is fine for the Image, but I suspect modules were overwritten by a different config (one with different CONFIG_LOCALVERSION and/or different integrated “=y” symbols) and can no longer be loaded. You might want to reinstall the original modules, and then only add changed modules via copy of file.

I truly appreciate your support. Unfortunately, even after trying several different approaches, I haven’t been able to get it working correctly.

I’m starting to wonder if building the kernel natively might be the root of the problem. I’ll try using an x86 host machine to see if that resolves the issue…

(But I still want to do this on native…)

there are no specific steps/thorough documentation provided by Nvidia to do native build on jetson

Okay. Thanks.

Native compile is usually not a problem. Sometimes people run into issues when compiling boot code natively simply because a different compiler release might be needed. In terms of compiling natively as an issue, mostly this seems to be when people leave some of the old cross-compile procedures in place; the most common is that when native compiling you should not set ARCH. Even if the host is arm64, the specification of “export ARCH='arm64'” will break things and cause the compiled code to be treated differently (module insert would fail claiming it is a foreign architecture).

I have often compiled natively on the Jetson itself. Do keep in mind though that if you have already changed the Image file, then everything is being compared to that, and not to the original Image. If you can still boot to the original Image, then I’ll encourage you to do this:

  • When booted to that, copy “/proc/config.gz” somewhere safe.
  • gunzip this file, and rename it something like:
    mv config config-$(uname -r)
    Then edit CONFIG_LOCALVERSION to be:
    CONFIG_LOCALVERSION=-tegra`
  • Use that for your “golden configuration”. Copy it to the output location of your kernel compile. Example: If output is “O=~/output/”, then copy your reference to “~/output/.config” (rename it to .config). From then on, all of your configuration would be through nconfig, and all changes would be only to add modules.
  • Make sure you do not set ARCH, nor CROSS_COMPILE. Make sure all steps include “O=~/output/” (or whatever location you want; it is ok to use an environment variable to set that up).
  • Use a separate module output location so as to not pollute the original actual module location. Example:
     export TEGRA_MODULES_OUT='~/modules_out'
     # Some liberty with pseudo coding:
     make ...options... O=~/output/ INSTALL_MOD_PATH=$TEGRA_MODULES_OUT' modules_install
    
  • Isolate the modules you’ve built within $TEGRA_MODULES_OUT which are new. Manually copy those to the same subdirectory within “/lib/modules/$(uname -r)/kernel/”. Run “sudo depmod -a”, and monitor “dmesg --follow” while running the depmod. Copy any errors. Try to modprobe the modules, again monitor “dmesg --follow” and noting any errors. Beware that if there is an error based on order of module load that this won’t necessarily be an issue, although you might need to reboot or try again in a different order (your goal is to find out if the module can load, and the actual load is not so important at this point).

The NVIDIA cross compile instructions are rather pain-free for kernels. The place where I diverge in what they suggest is the module installation answers. NVIDIA designed this around adding the content to the flash software, which means you’d need to flash. My suggestions revolve around simplifying and putting modules or kernel Image files in without flash. The NVIDIA suggestion is more valuable for production or setting up before flash, whereas I am assuming you have a running system you don’t want to flash. In all cases though beware that any modules used during boot stages will complicate everything and force understanding the initrd.

Here is my process on Jetson native.

wget https://developer.nvidia.com/downloads/embedded/l4t/r36_release_v4.3/sources/public_sources.tbz2
tar -xf public_sources.tbz2

cd Linux_for_Tegra/source/
sudo tar -xf kernel_src.tbz2 

cd kernel/kernel-jammy-src/
sudo mkdir kernel_out

sudo zcat /proc/config.gz | sudo tee kernel_out/.config > /dev/null

sudo make O=kernel_out nconfig

#######################################################
General setup
	> Local version: -tegra
	
Networking support > Networking options 
	# xfrm_user
	> <M> Transformation user configuration interface
	# virtio_vsock
	> Virtual Socket protocol > <M> virtio transport for Virtual Sockets
	# cls_u32
	> QoS and/or fair queueing > <M> Universal 32bit comparisons w/ hasing (U32)

Device Drivers > VHOST drivers 
	# vhost_vsock
	> <M> vhost virtio-vsock driver
	# vhost_net
	> <M> Host kernel accelerator for virtio net
#######################################################

sudo make -j6 O=kernel_out modules

export MODULES_OUT="$PWD/modules_out"
sudo make -j6 O=kernel_out INSTALL_MOD_PATH=$MODULES_OUT modules_install

cd modules_out/lib/modules/5.15.148-tegra/kernel/

sudo mkdir /lib/modules/5.15.148-tegra/kernel/drivers/vhost
sudo cp drivers/vhost/* /lib/modules/5.15.148-tegra/kernel/drivers/vhost/
sudo depmod -a

sudo modprobe vhost
>>>
modprobe: ERROR: could not insert 'vhost': Invalid argument
vhost: disagrees about version of symbol wake_up_process
vhost: Unknown symbol wake_up_process (err -22)

Could you check what might be wrong?

(+ Cross-compilation on the x86_64 machine was successfully done.)

One detail is missing, the configuration is incomplete as is. If you had built Image, then this does more than build Image…it also propagates the configuration to other parts of the code. Without that your config will not match (the config will be there, but it won’t have been applied). Since you did not build Image, can you start over, but this time, after you’ve modified the config with the editor, and prior to building modules, you must do this (building Image does this for you):
make O=kernel_out modules_prepare
(you will have to probably start over to be certain; I recommend that you delete “kernel_out/” and start over just to be certain, along with the same for old output in modules_out/)

If you still see an issue, then there might have been an automatic insert of a “+” in the config name. If and only if this is the case, then see:
https://forums.developer.nvidia.com/t/jetson-nano-no-display-from-hdmi/248228/10

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.