Boot fails with looped message: "systemd-journald[2405]: /dev/kmsg buffer overrun, some messages lost"

eran.peled · July 17, 2023, 1:52pm

I have customized board which includes 4 Xavier NX.
When this board is connected to one of our specific host systems, Linux completes its boot process. (Linux is working as desired).
When the same board is connected to another host system, it seems like the boot procedure is stuck on an endless loop of:
“systemd-journald[2405]: /dev/kmsg buffer overrun, some messages lost.”

Attached is a log printout
stand_alone.txt (36.0 KB)

Can anyone look at the attached file and guide me on what should I do in order to debug this issue ?

Thanks,
Eran.

KevinFFF · July 18, 2023, 6:34am

Hi eran.peled,

What’s your Jetpack version in use?

Could you also share this log with boot successfully?

Why your board has the different boot behavior when it connects to different host?
Are you boot it from network?

How do you flash the 4 Xavier NX modules on your custom board at the same time?

eran.peled · July 18, 2023, 9:34am

Hi Kevin,

Thanks for the quick replay.

4.4

This is a very important piece of data, which I also trying to have as the other “working” system is not available.
I do have a 3rd Host system that includes our card (but it is a lab jig), which this Linux also works just fine. Log is attached.

working_jig.txt (42.8 KB)

The boot is the same. both from the internal eMMC. no network boot.

I have a DIP switch which routes the USB to the specific module.

KevinFFF · July 18, 2023, 4:02pm

It’s quite an old release. Could you help to update to the latest R32.7.4 and verify?

It seems there’re also many error messages including display/I2C/USB…etc.

Could you share the block diagram and the connections of your setup?

I’m so wonder why the boot up behavior is affected by the host PC.
Could it boot up if you don’t connect the board to host PC?

eran.peled · July 19, 2023, 4:47am

Currently, R32.7.4 and BSP, which fits to our hardware, is not available by the hardware vendor.

I am very sorry, but I am not allowed to send a full diagram of the hardware. I would appreciate if you could instruct me specific questions I could ask the hardware guys…

I was probably misunderstood regarding the host. When I mean host, I don’t mean a host PC, I mean another piece of embedded hardware which our board is connected to. This host supplies its power , connects to is busses (PCIe , USB , Ethernet …), and route its output (DP)…

KevinFFF · July 19, 2023, 8:07am

It seems you are using the custom carrier board from another vendor.

It seems you are using another embedded module as host and using Jetson as client.

The hardware design is so much different from the devkit.

Could you reproduce the similar issue on the devkit?
Or I would suggest you asking for the help from your vendor, they may much know the custom design of your board.
In addition, there’re many errors messagess as I told you before should be fixed.

About this message, it will occur when the kernel log buffer is full and new messages are being generated.
You could increase it through configuring CONFIG_LOG_BUF_SHIFT in kernel config.

eran.peled · July 20, 2023, 4:45am

I am attaching a drawing, which would probably better explain the situation. (And why devkit is not an issue here and can’t play a real in the debug process).

The carrier board’s vendor is currently not involved, as its product has no problem.

Its carrier board works in its JIG and in another configuration of ours.

What I am trying to find out is, what could prevent the Linux from booting properly in our not working configuration. (Or at least someone who could guide me with a proper way how this could be debugged).

Many error messages are also seen in the working configuration.

I am trying to find out what is the “deal breaker” for the L4T, which prevents it from finish its boot. (And of course this is our first step. After this problem is resolved, we work in order to fix all other error messages).

KevinFFF · July 20, 2023, 9:25am

What’s the difference between your “Product Box 1” and “Product Box 2”?
Is there any hw design difference or using the different method to flash the board?

eran.peled · July 20, 2023, 9:46am

Hi Kevin,
First of all, I would like to thank you keeping up with this thread :)

There are some differences with the actual “Other Cards” connected to the boards and thus to the overall PCIe connectivity and DP connectivity. (there is a PCIe switch connecting the NXs and some of the “Other Cards”)
Flashing and booting the boards is always the same. Boot is performed from the the internal eMMC of each one of the NXs , and flashing is performed using a DIP switch selector on the vendors carrier card (to choose the correct NX) and the recovery signal and USB.

linuxdev · July 20, 2023, 1:54pm

This may be unrelated, but I want to add some comments…

Any carrier board which has a different layout (e.g., some pins of the module have multiple possible functions) than the dev kit will need a different device tree to set up that pin layout (that device tree will differ from the default tree based on the pins which have different function). It is quite easy for a minor issue in the device tree to disable some hardware (perhaps hardware used in boot).

If security fuses are not burned, then device tree and kernel content of an eMMC model can be taken from the signed partitions. On the other hand, if device tree and/or kernel are named in extlinux.conf, then the files named take precedence. Make sure you know which tree is being used, and that the tree is the one you expect for the required wiring layout of that board.

The initrd can complicate this. This is basically a very tiny Linux operating system using the outside kernel (from “/boot” or the signed partition), but it has a minimal init (systemd for most Ubuntu), and it also has needed kernel modules. Those modules in turn must match the kernel which is being used (some modules won’t load if they are compiled against the same kernel, but a different configuration). Make sure you know which kernel is being used, and which modules are required for the moment the pivot root transfers from the initrd ramdisk to the main storage (e.g., NVMe, USB drive, so on).

Often boot logs will provide that information if logging is enabled. Note that logging might not be enabled within an initrd, but if not, then you can often look at messages before and after the initrd and see what is going on. There may also be device tree changes within the initrd.

KevinFFF · July 21, 2023, 5:12am

Is this log coming from one of your Xavier NX on “Vendor’s Carrier board”?
How about other Xavier NX? Does all of them have the similar serial console logs?

and this seems the warning message to inform you the kernel log buffer full rather than the error message.

eran.peled · July 31, 2023, 6:03am

This is coming from one of the NXs. I am waiting to have access again to the system so I would have the ability to check the other ones and to try to put the hardware in the working JIG again and search for more logs. I will update/ask more when I have more data….

You are probably right

system · August 23, 2023, 2:10am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Jetpack 34.1 Kernel 5.10 Flashing hangs Jetson Xavier NX reflash	11	1104	May 12, 2022
No communication between Jetson Xavier NX and Linux host - can't boot or flash Jetson Xavier NX usb	7	795	January 2, 2023
Sometimes not booting in Jetson Xavier NX Jetson Xavier NX boot , kernel	34	1104	January 16, 2024
Nvidia xavier nx dev kit not able to boot properly Jetson Xavier NX boot	31	888	July 13, 2022
Jetson Xavier NX developer kit turns on but doesn't give dsiplay Jetson Xavier NX jetpack , boot	60	1835	June 17, 2023
Inconsistent flashing with initrd Jetson Xavier NX boot , nvme	25	1644	August 8, 2023
Jetson Xavier NX how to fix “ERROR: mmcblk0p1 mount fail” boot up error Jetson Xavier NX ubuntu	32	256	November 20, 2024
Impossibility of flashing xavier nx on jnx30D Jetson Xavier NX reflash	29	1575	December 14, 2022
Booting does not complete on Jetson Xavier NX & Quark Carrier - Please complete system configuration setup on the serial port provided by Jetson's Jetson Xavier NX boot	30	4739	October 18, 2021
Set static IP before flashing? Jetson AGX Xavier reflash	15	707	February 13, 2024

Boot fails with looped message: "systemd-journald[2405]: /dev/kmsg buffer overrun, some messages lost"

Related topics