I have a new custom board that I have just flashed with a customized kernel, device tree, etc. MB1, MB2 seem to run fine. UEFI appears to start, but it stops part way through, and starts outputting garbage characters. Does this suggest anything specific?
I have attached the boot log and the flash log for reference?
*** Please note that this reply is generated by LLM automatically ***
Based on the provided information, it appears that you have a custom board with a Jetson AGX Orin module, and you’ve flashed it with a customized kernel, device tree, and other configurations. The boot process seems to start normally, with MB1 and MB2 running fine, and UEFI initializing. However, it stops partway through and starts outputting garbage characters.
This issue could be related to several factors, and I’ll provide some potential causes and suggestions for troubleshooting:
Device Tree Configuration: Since you’ve customized the device tree, it’s possible that there’s an issue with the configuration. Double-check your device tree files (e.g., tegra194-agx-orin.dtb) for any errors or inconsistencies. You can compare your custom device tree with the original one provided by NVIDIA to identify any differences.
Kernel Configuration: Verify that your customized kernel configuration is correct and compatible with the Jetson AGX Orin module. Check the kernel logs for any errors or warnings that might indicate a problem.
UEFI Configuration: Ensure that the UEFI configuration is correct and properly set up for your custom board. You can try resetting the UEFI configuration to its default values and see if that resolves the issue.
Bootloader Configuration: The bootloader configuration might be incorrect or incompatible with your custom board. Check the bootloader logs for any errors or warnings, and verify that the bootloader is properly configured for your board.
Hardware Issues: Although less likely, it’s possible that there’s a hardware issue with your custom board. Verify that all connections are secure, and there are no signs of physical damage or wear.
Firmware Version: Ensure that you’re using the latest firmware version for the Jetson AGX Orin module. You can check the firmware version in the boot logs or using the nvboot command.
Boot Order: Verify that the boot order is set correctly in the UEFI configuration. Ensure that the UEFI is set to boot from the correct device (e.g., eMMC or SD card).
To further troubleshoot the issue, you can try the following:
Check the boot logs for any error messages or warnings that might indicate the cause of the problem.
Use a serial console or a debug cable to capture the boot logs and examine them in more detail.
Try booting the board with a different kernel or device tree configuration to see if the issue persists.
Use the nvboot command to verify the firmware version and check for any errors or warnings.
By examining the boot logs and flash logs you provided, I couldn’t find any specific errors or warnings that would point to a single cause. However, I recommend checking the device tree configuration, kernel configuration, and UEFI configuration as potential starting points for troubleshooting.
If you’re still unable to resolve the issue, please provide more information about your custom board, kernel configuration, and device tree configuration, and I’ll do my best to assist you further.
*** Please give the thumbs up if you get answers you like. Or provide feedback to help us improve the answer. ***
Thanks for the suggestions. I looked through a number of things but didn’t find anything that improved the booting when I tried it. I looked a the PCIe changes, and they looked reasonable. When I removed them to see if I had done something wrong, UEFI hung even earlier. Same with omitting the ODMDATA from the configuration file.
I converted the files to have a .txt suffice because I couldn’t upload them with the .dtsi suffix.
We have also tried using a working DevKit’s SOM on our board, but it hangs in UEFI earlier than this one. I haven’t changed the UEFI configuration. The kernel configuration file is here, but since we don’t actually get to the kernel, it doesn’t seem that useful:
Continuing to try various things to debug, and there is progress and some new questions…
It turns out that the system was actually booting all the way to Ubuntu Linux as shown on the HDMI output. Hooking up a screen to the carrier board confirmed this. However, we’re still getting the garbage output from the debug UART, starting about halfway through the UEFI boot stage. The garbage characters received are NOT because the UART line is noisy. The UART signal looks, on an oscilloscope, to be providing well-formed ASCII characters. However they’re being interpreted as garbage. The characters that are coming from the UART appear to be at about 124.4k BAUD (~8.04us per bit), instead of 115.2k (8.68us per bit). That’s about 8% high, and too far off for reliable UART communications.
What might cause the UART BAUD rate to switch from 115.2k to 124.4k BAUD?
It certainly seems like a clock configuration, but I didn’t explicitly change UEFI or its configuration. So, what might cause such a clock configuration change, and where should I look?
It’s possible that it’s not a correct UART signal, though. I switch my terminal BAUD rate to 124.4k BAUD and it still doesn’t received them as valid ASCII characters.
From the original log, it seems your board boots and hangs in UEFI.
You may need debug UEFI firmware to check in details.
Do you mean that you can boot into the Linux desktop succesfully?
If so, please capture the dmesg to check if there’s any error as you are developing the custom carrier board.
Are you using the debug UART to capture them?
It seems the serial console log stops in UEFI but your board actually keeps booting.
I would like to know if you have custom design for UART on your board which may cause this issue.
For the devkit, debug UART is from micro-USB port.
Or you can also reproduce the similar issue on the devkit.
I figured it out. In the device tree I had enabled the uartc (serial@c280000) UART on its own, while it was still configured as going through the combined UART. Removing that status = “okay”; got it working correctly.