Jetson Orin Nano Fails to Boot from NVMe After Enabling mac80211 Mesh & BATMAN-adv (Built Using OE4T Builder)

Hi NVIDIA team,

I’m using a Jetson Orin Nano DevKit with JetPack 6.2 (L4T R36.4.3) and building my kernel using the open-source OE4T jetson-orin-nano-builder workflow.

This process has worked for me in the past, but now my device fails to boot after enabling mesh networking support in the kernel.


My Build Workflow:

  1. Used the get_kernel_src.sh script from the OE4T repo to fetch the sources.
  2. Modified kernel config via CLI (e.g. scripts/patch_defconfig.sh), enabling:

plaintext

CopyEdit

CONFIG_MAC80211=y
CONFIG_MAC80211_MESH=y
CONFIG_CFG80211=y
CONFIG_BATMAN_ADV=m
  1. Built the kernel image (Image) with:

bash

CopyEdit

make O=build -j$(nproc) Image
  1. Built modules with:

bash

CopyEdit

make O=build modules
make O=build modules_install INSTALL_MOD_PATH=modules_out
  1. Deployed Image to /boot/Image on the device.
  2. Copied modules_out/lib/modules/<version> to the target.
  3. Ran depmod on the device.

What’s Going Wrong:

  • After rebooting with the new kernel and modules:
    • UEFI boot appears
    • Then black screen or boot loop
    • Device fails to boot from NVMe rootfs (/dev/nvme0n1p1)
    • No recovery unless I reflash the device

What Works:

  • Booting from NVMe works with the default JetPack 6.2 kernel.
  • My build process worked previously with other config changes (non-mesh-related).
  • Only when enabling mesh support (mac80211, BATMAN_ADV) does it break.

What I Need Help With:

  • Is there a known issue with enabling mac80211 mesh or BATMAN_ADV while booting from NVMe?
  • Do I need to modify the initrd or bootloader when introducing mesh features?
  • How can I debug what’s going wrong in early boot (before rootfs mounts)?

System Info:

  • Jetson Orin Nano DevKit (8GB)
  • JetPack 6.2 (L4T R36.4.3)
  • Kernel version: 5.15.148-tegra
  • Boot device: NVMe SSD (/dev/nvme0n1p1)
  • Kernel built using OE4T jetson-orin-nano-builder

Thanks for your support,

I don’t know much about many of your steps, but it is rather important to know that an initrd is being used when you boot to external media (e.g., an NVMe instead of eMMC). If you’ve invalidated any of the modules to be loaded by the kernel within the initrd, then you have to recreate the initrd with new modules. Not all modules are needed for boot, and it is even possible that none of the initrd modules are required for boot, but odds are high that this is the problem.

Whenever you change the integrated features of a kernel (symbols enabled via “=y”), then all modules invalidate and are likely to no longer load without also rebuilding the modules. Let’s say that you had a module for the ext4 filesystem, and this no longer loads within the initrd; this would imply the rootfs cannot load because it loads from the initrd and is ext4.

Normally, if a feature can be built as a module, then configuring the kernel source to match the existing configuration (including CONFIG_LOCALVERSION), followed by altering only module config, e.g., adding “CONFIG_MAC80211=m”, then all you need to do is build the new module(s) and copy them in place. Previous modules would still load into that kernel. Any module previously put into the initrd will continue to work. Had you changed to “CONFIG_MAC80211=y”, then the kernel Image itself must be replaced, and all modules must be rebuilt and installed. There is an application binary interface to the Image and this does not change if only module configuration is made.

Not all symbols/features can be in the form of a module, and not all symbols/features can be in the form of integrated into the kernel, but most can be either. Be sure to use a dependency aware editor for configuration changes since these understand what can or cannot be a module. A lot of people use the make target menuconfig, I prefer nconfig (it is the same except it adds a symbol search).

Incidentally, the command “uname -r” has an output which starts with the kernel version, and then appends the CONFIG_LOCALVERSION string to the end. The default CONFIG_LOCALVERSION is “-tegra”. For example, if the kernel source is version 5.10.15, and if the CONFIG_LOCALVERSION is “-tegra”, then “uname -r” responds with “5.10.15-tegra”. This becomes part of the search path for modules and is built into the kernel Image at the time of kernel build. Modules are searched for at:
/lib/modules/$(uname -r)/kernel/

If you build modules only, and keep the same “uname -r” (meaning you also kept the same CONFIG_LOCALVERSION), then any module you copy into the right place should work without further effort. If you change any integrated feature, then the opposite is true: You would then want to change CONFIG_LOCALVERSION, e.g., something meaningful like “-mac80211”; the example “uname -r” would then end up as “5.10.15-mac80211”, and the module search location would become:
/lib/modules/5.10.15-mac80211/kernel/

If you left the original kernel Image in place, then it would still be able to find the “/lib/modules/5.10.15-tegra/kernel/” content. The two kernels would have mutually exclusive modules, but it would work out well because each kernel would know where to load modules from.

An initrd is more or less an adapter between boot stages and final root filesystem load. It exists in RAM and is a simplified tree filesystem which the bootloader and Linux always understand. If you wanted to load a filesystem type which the bootloader does not understand, then placing the module in the initrd allows loading that module prior to loading this final filesystem (and then a pivot root replaces the initrd “/” with the NVMe’s “/”). I don’t know which modules were in your original initrd, but likely one or more were needed for boot.

L4T is what gets flashed, and in turn, this is just what you call Ubuntu after adding NVIDIA content. You can check your L4T release with “head -n 1 /etc/nv_tegra_release”. The docs specific to your release are here:
https://developer.nvidia.com/linux-tegra

You can flash using those procedures with your kernel, or, if you can put the old Image back in place, then that might work too (and then you could install the new kernel again, but make sure the initrd is updated). I don’t actually have an NVMe to work with, so I can’t provide details. I do recommend leaving the old Image in place and giving the new Image a file name such as “Image-mac80211” (then both kernels are available; you can delete the original, but it is a good idea to wait until reboot is tested before doing so).

Yes you were right it’s initrd problem,

But with the new kernel my RTL8822CE is not working although it’s detected under PCI devices? Do you have any ideas why. Even I can’t see it as a module any where in the kernel configuration to add it.

Thanks for your support,

Detecting a device in PCI results in the hot plug software announcing what it sees to the drivers. If a driver can work with that device, then it will take ownership. Seeing something on PCIe with tools like lspci only means electrical success.

The actual driver loading is what causes the device to become visible to software. For example, if a device special file goes into “/dev”, then this is not a real file; instead, this is the loaded driver pretending to be a file. No driver load means no device special file. Other than PCIe being required for the driver to see the hardware, PCIe is not part of making the specific device useful (PCIe is a pipe without knowledge of what the hardware connected to it needs).

If a driver is integrated into the kernel, then it is still possible for the driver to fail to load if:

  • The driver does not know where to find the device (not a problem with PCIe; device tree itself becomes a problem with some non-plug-n-play devices, although the latter is not related to your case).
  • If the driver requires arguments to be passed to it, and either the arguments are incorrect or missing, then driver won’t load.
  • You can see if a module format driver is loaded with the lsmod command. You cannot see an integrated driver load or fail state via lsmod since it is not a module format.

Sometimes the numeric IDs used in plug-n-play devices (which is probably what your PCIe card is) have been changed by a manufacturer in order to rebrand it. Same hardware, but now requiring that brand’s driver. Should that be the case the udev system would change some of the plug-n-play information, and often that is what the driver is for rebranded hardware: Simply a note that this unusual numeric ID should be serviced by the generic driver. Rebranded hardware with a Realtek network device could fail to load if the “driver” associating that hardware with a driver not normally responding to the given ID is missing. There are not many rebranded devices like this, it is more maintenance for the company supporting the product.

Most likely you just need the driver.

When a kernel is built it has a lot of optional “symbols”, and a large number of symbols just correspond to a driver. I’m not looking at a Jetson right now, on this desktop PC I see “realtek” within “lsmod”, and so I know a symbol provided the “realtek” driver, and that it is in the form of a module. Do you see any output from:
lsmod | grep -i realtek

More interesting is that on a Jetson you will see this file:
/proc/config.gz

That file is not a real file, but is instead the kernel itself listing the symbols which were configured at the time of kernel build. lsmod only shows driver names, but you can see the symbol associated with this via:
zcat /proc/config.gz | grep -i realtek

What do you see for realtek searches of lsmod and config.gz? I’m betting the driver is missing. If not, then there might be another dependency in a chain of dependencies.

Default pcie driver is read from initrd. So please also remember to update the OOT modules in the initrd binary before flashing.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.