Rootfs build and flash problem

  • 1 BSP environment:
    TX2 jetpack 4.6 L4T R32.6.1 kernel 4.9 aarch64
    TX2 (p3310)
  • 2 issue:
    the SDKmanager has been installed on cross compile host, but there are three directories in rootfs/modules, which one is necessary?
$ ls lib/modules
4.9.253  4.9.253+  4.9.253-tegra
$ pwd
/home/ubuntu/nvidia/nvidia_sdk/JetPack_4.6_Linux_JETSON_TX2_TARGETS/Linux_for_Tegra/rootfs

and where should we install modules?

make O=$TEGRA_KERNEL_OUT modules_install INSTALL_MOD_PATH=$KERNEL_MODULES_OUT

because on tx2 device the command uname -r shows 4.9.253, we think that INSTALL_MOD_PATH=4.9.253, and other directories can be deleted. is that right?
Regards

The long story is a description of how one configures a kernel for compile and install. See:
https://forums.developer.nvidia.com/t/how-to-fix-xorg-conf-in-jetson-orin/229155/12

The short answer is that every kernel provides the output of the command “uname -r” (which is partly a configuration before compile), and that a kernel searches for any modules at:
/lib/modules/$(uname -r)/kernel

This applies to the kernel to be used. If you have modules, then it is critical to have “rootfs/lib/modules/” (and it is 100% fatal to not have “rootfs/lib/” since this becomes “/lib” in the flashed Jetson).

If you use the stock kernel (and this implies any modules compiled against that configuration), then you must have:
rootfs/lib/modules/4.9.253-tegra/
(the reason being is that this is the stock installed kernel)

The fact that you have “4.9.253” and “4.9.253+” tells me you’ve probably experimented with kernel builds and should read the document on kernel configuration.

Here is a more complete document on the actual kernel build you might find of use (even if not of interest :P ). Official documents concentrate on flash and cross compile, while the above URLs try to explain more about why configuration and kernel install succeed or fail (considers native build and not cross compile):
https://forums.developer.nvidia.com/t/problem-smb-jetson-nano/193640/11

Hi @linuxdev
thanks to you reply

we rebuild the kernel source code, without setting LOCALVERSION, so that "uname -r " shows 4.9.253. we make modules and install them to $KERNEL_MODULES_OUT, and then copy $KERNEL_MODULES_OUT/lib/modules/4.9.253 to rootfs/lib/modules/4.9.253.
copy the rebuild Image to kernel/Image.
then ./flash.sh jetson-tx2 mmcblk0p1
terminal log shows flash successfuly, but our device hungs, after we do the system configuration on the desktop system.
attachment is our TX2 log.
FYI
2022-11-10.log (24.1 KB)

update
we add LOCALVERSION=-tegra
rebouild kernel and dtb
but no help, we wonder what is system configuration doing? could we skip it?
FYI
boot_log.log (77.7 KB)

Regardless of what kernel configuration modifications you made, what was your starting configuration? Did you have a base configuration set up correctly prior to adding your changes?

Btw, this indicates success, but your system needs first boot login account setup:

[    6.730653] Please complete system configuration setup on desktop to proceed...

What isn’t certain is why your serial console did not offer you a chance to complete that (or perhaps via local monitor/keyboard). In any case boot succeeded to the point that Linux loaded and user space software began. I don’t think the camera debug lines mattered, and “using random ... ethernet address” is not a boot issue, although it might indicate something else for later thought.

Using “LOCALVERSION=-tegra” only helps if the original (as shipped) modules work, and you are adding a module. If the configuration was a sufficiently close starting match, then all of the original modules would “just work”.

I can’t tell you what is going on though, there isn’t enough log. Someone from NVIDIA might be able to help with that, and it is possible the device tree is the issue (device tree could also be related to the “random ethernet address” log message as well).

I’ll suggest that you state:

  • If your starting kernel configuration was from tegra_defconfig or some other means.
  • Post a copy of the device tree .dts (and any overlay).
  • State which carrier board you are using: A dev kit or some other carrier board.

Hi @linuxdev
thanks ro your reply.
yes, we start kernel configuration from tegra_defconfig, and then we modify it by ‘make menuconfig’. we do not think it’s the configuration issue, because we used the modified .config for about 2 monthes, and it was stable.

untill we used ‘flash.sh jetson-tx2 mmcblk0p1’ to flash all, problems came out.
if we flash all, the bring up log shows cboot->uboot->kernel. but if we use ‘flash.sh -r -k kernel --image=kernel/Image jetson-tx2 mmcblk0p1’ to flash kernel only, bring up log shows cboot->kernel, it skips uboot. why?

and now, zcat /proc/config.gz shows mttcan can_raw can_dev have been configed to compiled into kernel. but, can does not works as used to be. ifconfig does not shows any can device, but, bring logs can see the can0 can1 device init.
FYI
8_2022-11-11_16_53_46.log (337.7 KB)
8_2022-11-11_16_35_03.log (156.3 KB)

I can’t answer regarding flash of just a particular partition. I have an idea of what might be going on, but I don’t know the flash “-k kernel” option to be sure.

When you flash with option “-k kernel” it does not necessarily mean the Linux kernel. U-Boot is itself a kernel (despite it having the job of overwriting itself with another kernel). If you are using the “-k kernel” option to flash the Linux kernel, then you might not be flashing the content you think you are flashing. Plus if you are using redundancy, there might be a kernel_b which is actually being used if something went wrong.

Incidentally, I don’t see the CONFIG_LOCALVERSION set (there is more than one way to set this, so perhaps it doesn’t matter). Be certain all modules are actually installed to the right location (at “/lib/modules/$(uname -r)/kernel/”).

You would have to explain why you are using “flash.sh -r -k kernel --image=kernel/Image jetson-tx2” (what you want to occur) so someone could say if this is what is actually occurring. I also don’t know enough about can devices, but device tree would very likely be a reason if it fails despite having this integrated into the driver.

Hi
all source code are the same.
boot0.log is by the command ‘./flash.sh jetson-tx2 mmcblk0p1’.
our device bring up without CAN, and without tegra-asoc sound card mapping, and without loading dtb name.
but all these can be seen from previous 8_2022-11-11_16_35_03.log
could any one tell us?
boot0.log (52.9 KB)
8_2022-11-11_16_35_03.log (156.3 KB)

[    0.183329] DTS File Name: ../arch/arm64/boot/dts/../../../../../../hardware/nvidia/platform/t18x/quill/kernel-dts/tegra186-quill-p3310-1000-c03-00-base.dts
[    0.183343] DTB Build time: Nov 11 2022 15:17:12

once we use flash.sh -r -k kernel --image=kernel/Image jetson-tx2, all the missing bring up logs come back, which matches with 8_20221-11-11_16_35_03.log. why?
our developers usually modify some source code and flash the kernel only.

I can tell the system was not shut down properly in the first 8_2022-11-11_16_35_03 log (orphan inodes were deleted, implying there was filesystem damage whereby the journal removed 12 inodes which did not complete writing). However, I don’t know enough about CAN to answer what the device tree needs, nor its setup. If you post the extracted device tree then (A) someone knowing device tree setup for CAN would be able to tell you if it is correct, and (B) it would tell you if your changes actually exist in the tree. To extract device tree as the running system sees the tree:
dtc -I fs -O dts -o extracted.dts /proc/device-tree

So far as the kernel driver goes, also upload a copy of “/proc/config.gz”. This will show what the kernel’s configuration was set to at build.

Hi

how did you find that, could you show up more details?

what is the following log showing?

[    9.269164] CPU1: shutdown
[    9.315693] CPU2: shutdown

our developer modified and rebuilt kernel/Image, how to use flash.sh to update Image only?
we search the flash.xml, and saw that partition name “kernel” matched “boot.img”, and we did not see “Image” file.
if the flash.sh does not has this feature, how could we update Image only? could we overwright directly the /boot/Image in tx2 device?

attachment is our dts, could any one help to verify why CAN does not work? BTW, the the dts has been used for a time, and CAN works well, untill we flashed system.img by the flash.sh script.
henry20.dts (447.7 KB)

so, we want to check flash procedure, and could any one offer us the standard procedure to flash kernel(Image) and dtb?
what is system.img? does it contains image and dtb and rootfs? we think system.img is a “all in one image”, right? Could anyone offer us more systematic information of the source code not only a piece of chapter. is there any video training about it.

every time after we use “flash.sh -r -k APP jetson -tx2 mmcblk0p1”, the device first bring up, we have to do system config in the desktop environment, how to record all these configration in somewhere, so that we can backup the whole system and flash it to a brand new device?

COULD SOMEONE HELP?

Regards

hello Henry.Lou,

please refer to Flashing a Specific Partition,
you’re able to flash a specific partition instead of flashing the whole device by using the command line option ‑k.
in short, you may use below to update kernel partition, $ sudo ./flash.sh -r -k kernel jetson -tx2 mmcblk0p1

please also note that,
there’s cboot functionality for loading images via file system.
please review /boot/extlinux/extlinux.conf.
there’s LINUX to specify the path of your kernel image. If there is no LINUX entry, the kernel binary is loaded from the kernel partition.
you may see-also Kernel Boot Sequence Using extlinux.conf.

it’s the root file system, or, you may consider it’s the image contain all your user-space data.

I see this in the 8_2022-11-11_16_35_03.log:

[    6.466577] EXT4-fs (mmcblk0p1): 12 orphan inodes deleted
[    6.473086] EXT4-fs (mmcblk0p1): recovery complete
[    6.484061] EXT4-fs (mmcblk0p1): mounted filesystem with ordered data mode. Opts: (null)

The “CPU” shutdown log is not an error. This is just the power mode changing. CPU0 always runs since this handles hardware IRQs. Other CPUs (especially Denver cores) may shut down for power saving unless told to run (I think on the TX2 CPU1 and CPU2 are Denver cores).

Unless security fuses are burned you can simply use a file copy, but here is some information to note:

  • The kernel itself determines the output of the command “uname -r”. The prefix is the kernel version, the suffix is the CONFIG_LOCALVERSION feature during compile. The default is “CONFIG_LOCALVERSION=-tegra”. As an example, if the kernel source is version 4.9.190, and the default CONFIG_LOCALVERSION is set, then “uname -r” would respond with “4.9.190-tegra”.
  • Each kernel looks for modules at “/lib/modules/$(uname -r)/kernel”. If the modules are not there, or if incompatible modules are there, then modules will fail to load. Modules are part of the kernel, but load at run time.
  • If you match the configuration of a kernel to the shipping version, and also match the “uname -r, then the kernel will find and load all of the original modules. One can add modules to this and simply copy them to the right location if you are only adding module features during a kernel build.
  • If you build the kernel itself such that it has integrated new features into it, then it is possible old modules will be incompatible. In that case you’d be advised to change the “CONFIG_LOCALVERSION” (e.g., to “-test”) and install all new modules at the new location. Not needed for additions via module if all else remains constant.
  • During boot, if security fuses are burned, then the kernel can only be found in a signed partition which is updated by flash. Other than that you can generally just consider placing the new Image file in “/boot”. However, I’d suggest a new name and adjusting “/boot/extlinux/extlinux.conf” so the original is still there. Example: Place the Image as “/boot/Image-test”, and then name that file in extlinux.conf.

An example entry for a TX2 extlinux.conf for a default case is this:

TIMEOUT 30
DEFAULT primary

MENU TITLE p2771-0000 eMMC boot options


LABEL primary
      MENU LABEL primary kernel
      LINUX /boot/Image
      APPEND ${cbootargs} root=/dev/mmcblk0p1 rw rootwait rootfstype=ext4

You could keep the original, and use it as a recovery system if you were to change it like this (I’m assuming your “uname -r” is now “4.9.190-test” as an example; this is just for clarity in naming and has no real effect):

TIMEOUT 30
DEFAULT testing

MENU TITLE p2771-0000 eMMC boot options

LABEL testing
      MENU LABEL test kernel
      LINUX /boot/Image-4.9.190-test
      APPEND ${cbootargs} root=/dev/mmcblk0p1 rw rootwait rootfstype=ext4

LABEL primary
      MENU LABEL primary kernel
      LINUX /boot/Image
      APPEND ${cbootargs} root=/dev/mmcblk0p1 rw rootwait rootfstype=ext4

(the above implies copy of the compiled Image to become “/boot/Image-4.9.190-test”; if “uname -r” changes, then modules must also be rebuilt)

If no kernel is available via extlinux.conf naming a file, then it searches for this in the partition. During development it is easier to just use the file. You can of course just flash the kernel and skip file copy, but make sure extlinux.conf does not name a kernel.

Incidentally, the “-r” option to “flash.sh” says to “reuse” the existing “Linux_for_Tegra/bootloader/system.img” file. This prevents wasting time generating a new image. However, you need the other options to tell it to not flash the rootfs…any time you use “-r” without specifying something to flash other than rootfs it will flash rootfs, but it will be with a default image.

Hi @linuxdev
thanks to your reply, it realy helps a lot.

as we posted in the 9# comment, we found that after flashing system.img, when device first bring up, we have to do “system configuration”. Please refer to this picture.

our device did hardware detection, and then did time zoo, key board, etc configuration.
we wonder why the system need this step? we think the DT will tell system and do match well. did this step imply that there may be some problem in our DT?
Could we back up all of configurationgs and flash to a new device? so that we do not need do it in the desktop environment.

There used to be a default user name and password. Then California law made it so this could no longer be used (unless not shipping to California). Thus a “first boot setup” for the end user to add a name/pass. Regarding only that, you can use the “Linux_for_Tegra/tools/l4t_create_default_user.sh” script prior to flashing to set this up prior to flash (this would stay in effect until you run that script again).

The device tree won’t change this unless there is a bug.

Incidentally, if you are either wanting a backup or a production system, then you can flash this way or normally, run a system update and configure as desired, and then clone. The clone can be used for future flashes and will contain all of the updates and customization. Do beware that you’ll be flashing the same user name and password to all systems along with any cache or temp files which are still there during clone. This would be a problem if you are shipping to California, but that same clone could have the name/pass removed and first boot setup added back in…you’d still get a fully updated system right from the start.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.