Jetson Nano: Problems with kernel modules and custom rootfs, "disagrees about version of symbol"

Hi all!

I have a problem during my image build for my nvidia jetson nano devkit.

I want to dockerize everything and build a new rootfs from scratch with a custom kernel.

What I got working so far:

  • building the kernel with drivers for the TC358743 HDMI to CSI interface.
  • creating the rootfs with debootstrap and qemu
  • creating images and flash them to the board with the ./flash.sh script.

What i currently do is:

  • building a fresh rootfs with qemu-debootstrap
  • compiling the kernel sources (4.9.140) and creating zImage, modules and dtbs.
  • after these steps using make modules_install and integrate this is into my rootfs by copying the lib folder into the root of rootfs
  • replacing the dtb files in the Linux_for_Tegra tools
  • replacing the Image file in the Linux_for_Tegra tools
  • using ./apply_binaries.sh
  • flashing the device with ./flash.sh

I can boot and the new driver works and I can record images, but if I call lsmod no modules are loaded.

Using dmesg command I see that the error is “disagrees about version of symbol module_layout”

Has anyone had similar issues before? I was mainly using the docs for l4t. I feel like I am super blind not seeing the issue.

Thanks in advance!

I’m not a “docker guy”, but when booted, what do you see for “uname -r”? Are modules and files located at (and adjust for any chroot) “/lib/modules/$(uname -r)/kernel/”? This latter path is how modules are found.

Within that path, are the modules built for the same kernel you loaded? Are you certain you loaded the correct kernel? In the 32-bit days zImage was used, but at some point at the start of 64-bit the uncompressed Image had to be used instead. I’m not sure if that is mandatory now, but you should verify that your kernel is really being used…if it isn’t, then probably the modules would be wrong for that kernel. Don’t know, but easy to check.

1 Like

Thanks for your fast contribution.
Yeah that was my first idea as well, but i double checked:

$ uname -r
4.9.140-tegra

modules are loaded into /lib/modules/4.9.140-tegra/kernel and also:

$ modinfo nvgpu
filename:       /lib/modules/4.9.140-tegra/kernel/drivers/gpu/nvgpu/nvgpu.ko
license:        GPL v2
alias:          of:N*T*Cnvidia,gv11bC*
alias:          of:N*T*Cnvidia,gv11b
alias:          of:N*T*Cnvidia,tegra186-gp10bC*
alias:          of:N*T*Cnvidia,tegra186-gp10b
alias:          of:N*T*Cnvidia,tegra210-gm20bC*
alias:          of:N*T*Cnvidia,tegra210-gm20b
depends:        
intree:         Y
vermagic:       4.9.140-tegra SMP preempt mod_unload modversions aarch64

Additional info:

  • kernel is built with CONFIG_MODVERSIONS
  • I read something about kernel headers but could not aggregate enough information about them on tegra, maybe someone who has a better understanding can help!

I do not think this is an issue with docker, I reproduced the Issue by running this locally on Ubuntu-18.04.

Ah I changed the build to generate everything with make <options> and copy the Image File, I do think that make <options> zImage also creates the uncompressed Image.

Thanks for the help!

So i got it partially working by removing CONFIG_MODVERSIONS from the config it now loads all modules and seemingly without errors.

This disables the kernel versions check and i think this is a dirty solution.

I would much appreciate any feedback!

You are correct that building zImage also builds Image.

By default the NVIDIA kernels enable CONFIG_MODVERSIONS. This works because after the kernel is configured all modules are built against that kernel source and configuration. Versions match.

If you were to retain CONFIG_MODVERSIONS and configure the matching source with the matching configuration, and then add your module, I would think that you would not have issues with CONFIG_MODVERSIONS and mismatched modules. It is possible that removing CONFIG_MODVERSIONS will work correctly, but I would always have to wonder why removing CONFIG_MODVERSIONS was needed. Perhaps the configuration mostly matches, but the software does not know this, or perhaps there is some tiny mismatch waiting to bite you.

For more of an explanation, consider what it takes to compile an exact match to the existing kernel as shipped by NVIDIA (or as freshly flashed).

  • The source code is the same release.
  • The configuration is the same, features are not added or removed.
  • uname -r” responds the same because you also matched CONFIG_LOCALVERSION.

Now consider that if you add a feature as a module, then previous features should not require a change to “uname -r” (and Image should not need changing). If you were to add or remove features which were conflicting, then “uname -r” would need to change and the modules used would need to be rebuilt and placed in the new module search path. Since CONFIG_MODVERSIONS is normally integrated into the kernel (for which a more invasive replacement of the Image is needed), I would tend to feel that there should be a change to CONFIG_LOCALVERSION (and thus module search path component “uname -r”), along with rebuilding all modules. This doesn’t mean the modules won’t work since we know there were no other changes, but this probably is not a good idea. Having module versions match guarantees compatibility, now you have no guarantees either of working or failing.

As a general comment, if you are just adding a module you’ve built, then there should not have been any change to the Image file (the integrated features). Instead of replacing the whole kernel it is probably possible that you could configure your source as a match (including CONFIG_LOCALVERSION), and then compile your new module against that source. Should that succeed, then even with CONFIG_MODVERSIONS the new module should probably be able to load.

In most cases the kernel config target “make tegra_defconfig” will match the kernel which ships once you’ve set:
CONFIG_LOCALVERSION="-tegra"

If you have the original kernel running, then you can guarantee an exact match for modules compiled against that source configuration (after updating CONFIG_LOCALVERSION) if you use “/proc/config.gz” for your starting configuration, and do nothing more than build a kernel module against this. If you changed the Image, then this is no longer possible until the original Image is put back in place.

1 Like

My understanding is CONFIG_MODVERSION is supposed to allow loading of modules built for other kernels bit it’s a half-broken feature. Kernel help text:

Usually, modules have to be recompiled whenever you switch to a new kernel. Saying Y here makes it possible, and safe, to use the same modules even after compiling a new kernel; this requires the program modprobe. All the software needed for module support is in the modutils package (check the file Documentation/Changes for location and latest version).

Source for the help.

Also this article on why Linus doesn’t like it :)

1 Like

@mdegans @linuxdev
Hey guys, thank you for your information! I am pretty new to Linux development so I really appreciate this!

I finally found my error! My problem was that during the generation of my system.img (see steps in initial message) I used the apply_binaries.sh script after I have copied my modules. Since I was using the same CONFIG_LOCALVERSION this has overwritten my modules again and they did not have matching versions with my newly built kernel.

The only thing I had to do was exchanging the steps and installing the modules only after using the script. Since this finally works now my firmware built is now fully automated in GitLab CI which is pretty neat.

In the future you might try to use a CONFIG_LOCALVERSION other than -tegra so apply_binaries won’t install the modules on top of yours.

In any case, Nvidia’s docs supply an optional alternative make modules_install step where you tarball the fake rootfs up, so apply_binaries.sh will install those rather than the -tegra modules. If you go this route, make sure to replace the kernel tarball and device tree as well (steps 5 and 6). Step for the modules:

8.Optionally, archive the installed kernel modules using the following command:

$ cd <modules_install_path>
$ tar --owner root --group root -cjf kernel_supplements.tbz2 \
lib/modules

The installed modules can be used to provide the contents of /lib/modules/<kernel_version> on the target system.

Use the archive to replace the one in the kernel directory of the extracted release package prior to running apply_binaries.sh:

Linux_for_Tegra/kernel/kernel_supplements.tbz2

Doing it that way allows you to start over with your rootfs. You can reset it, apply_binaries, and have a bootable system with your kernel, modules, and Nvidia’s software.

1 Like

That is an amazing tipp! Thanks for that! I’ll mark that as the solution since it is the intended way and less hacky!

Yw. Happy image building!

1 Like