AGX failed to start

I use AGX module, the carrier board is designed by myself, and the software version is 4.4.There was a problem of startup failure. The power-on sequence control of the test board, the power-on sequence meets the requirements of the datasheet, and it is normal when using the official NVIDIA carrier board. When viewing the difference between normal and abnormal through uart’s print information, the following problems were found .

Print information when startup fails

########## Fixed storage boot ##########
[0005.634] I> Already published: 00010003
[0005.635] I> Look for boot partition
[0005.638] I> Fallback: assuming 0th partition is boot partition
[0005.644] I> Detect filesystem
[0005.671] I> Loading extlinux.conf …
[0005.671] I> rootfs path: /sdmmc_user/boot/extlinux/extlinux.conf
[0005.708] I> L4T boot options
[0005.708] I> [1]: “primary kernel”
[0005.708] I> Enter choice:
[0008.710] I> Continuing with default option: 1
[0008.710] I> Loading kernel sig file from rootfs …
[0008.710] I> rootfs path: /sdmmc_user/boot/Image.sig
[0008.735] I> Loading kernel binary from rootfs …
[0008.735] I> rootfs path: /sdmmc_user/boot/Image
[0008.980] I> Validate kernel …
[0008.980] I> T19x: Authenticate kernel (bin_type: 37), max size 0x5000000
[0009.309] I> No kernel-dtb binary path
[0009.309] W> No valid slot number is found in scratch register
[0009.310] W> Return default slot: _a
[0009.310] I> A/B: bin_type (38) slot 0
[0009.310] I> Loading kernel-dtb from partition
[0009.311] I> Loading partition kernel-dtb at 0x91000000 from device(0x1)
[0009.319] I> Validate kernel-dtb …
[0009.320] I> T19x: Authenticate kernel-dtb (bin_type: 38), max size 0x400000
[0009.325] I> Loading ramdisk from rootfs …
[0009.326] I> rootfs path: /sdmmc_user/boot/initrd
[0009.378] I> Kernel hdr @0xa4ac0000
[0009.379] I> Kernel dtb @0x90000000
[0009.379] I> decompressor handler not found
[0009.379] I> Copying kernel image (34330640 bytes) from 0xa4ac0000 to 0x80080000 … [0009.389] I> Done
[0009.390] I> Updated bpmp info to DTB
[0009.392] I> Ramdisk: Base: 0x92000000; Size: 0x54ecaf
[0009.392] I> Updated initrd info to DTB
[0009.392] W> WARN: Fail to override “console=none” in commandline
[0009.393] E> tegrabl_linuxboot_add_disp_param, du 1 failed to get display params
[0009.399] E> tegrabl_linuxboot_add_disp_param, du 1 failed to get display params

Print information during normal startup

[0005.625] I> ########## Fixed storage boot ##########
[0005.625] I> Already published: 00010003
[0005.626] I> Look for boot partition
[0005.629] I> Fallback: assuming 0th partition is boot partition
[0005.635] I> Detect filesystem
[0005.662] I> Loading extlinux.conf …
[0005.662] I> rootfs path: /sdmmc_user/boot/extlinux/extlinux.conf
[0005.699] I> L4T boot options
[0005.699] I> [1]: “primary kernel”
[0005.699] I> Enter choice:
[0008.701] I> Continuing with default option: 1
[0008.701] I> Loading kernel sig file from rootfs …
[0008.701] I> rootfs path: /sdmmc_user/boot/Image.sig
[0008.726] I> Loading kernel binary from rootfs …
[0008.726] I> rootfs path: /sdmmc_user/boot/Image
[0008.964] I> Validate kernel …
[0008.964] I> T19x: Authenticate kernel (bin_type: 37), max size 0x5000000
[0009.296] I> No kernel-dtb binary path
[0009.297] W> No valid slot number is found in scratch register
[0009.297] W> Return default slot: _a
[0009.297] I> A/B: bin_type (38) slot 0
[0009.298] I> Loading kernel-dtb from partition
[0009.298] I> Loading partition kernel-dtb at 0x91000000 from device(0x1)
[0009.306] I> Validate kernel-dtb …
[0009.307] I> T19x: Authenticate kernel-dtb (bin_type: 38), max size 0x400000
[0009.312] I> Loading ramdisk from rootfs …
[0009.313] I> rootfs path: /sdmmc_user/boot/initrd
[0009.365] I> Kernel hdr @0xa42b0000
[0009.366] I> Kernel dtb @0x90000000
[0009.366] I> decompressor handler not found
[0009.366] I> Copying kernel image (34330640 bytes) from 0xa42b0000 to 0x80080000 … [0009.372] I> Done
[0009.373] I> Updated bpmp info to DTB
[0009.374] I> Ramdisk: Base: 0x92000000; Size: 0x54ecaf
[0009.374] I> Updated initrd info to DTB
[0009.375] W> WARN: Fail to override “conPreformatted textsole=none” in commandline
[0009.378] E> tegrabl_linuxboot_add_disp_param, du 0 failed to get display params
[0009.386] E> tegrabl_linuxboot_add_disp_param, du 0 failed to get display params
[0009.393] E> tegrabl_linuxboot_add_disp_param, du 0 failed to get display params

“[0009.378] I> Kernel hdr @0xa4ac0000

and

“[0009.365] I> Kernel hdr @0xa42b0000”?

What do these two addresses mean?

fail.log (26.8 KB) normal.log (28.1 KB)

hello 13940204662,

since there’s failure related to display parameters,
could you please have a quick try to disable display for testing?
for example, CONFIG_ENABLE_DISPLAY=0
thanks

Thanks for your reply .

When an external monitor is connected, these two sentences will be printed.

[0009.392] W> WARN: Fail to override “console=none” in commandline
[0009.393] E> tegrabl_linuxboot_add_disp_param, du 1 failed to get display params
[0009.399] E> tegrabl_linuxboot_add_disp_param, du 1 failed to get display params

When an external monitor is not connected, these two sentences will be printed.

[0009.165] W> WARN: Fail to override “console=none” in commandline
[0009.169] E> tegrabl_linuxboot_add_disp_param, du 0 failed to get display params
[0009.176] E> tegrabl_linuxboot_add_disp_param, du 0 failed to get display params
[0009.184] E> tegrabl_linuxboot_add_disp_param, du 0 failed to get display params

Is this the cause of the problem?
I am trying to set CONFIG_ENABLE_DISPLAY=0 .

hello 13940204662,

may I also what kind of your external display is, thanks

HELLO

The display is Flat Panel Monitor,The model is E1916Hf,from DELL.

Could you remove the “quiet” in /boot/extlinux/extlinux.conf and reflash your board again? We shall see more log in kernel this time.

hello WayneWWW

we have removed the “quiet” in /boot/extlinux/extlinux.conf .
There are two phenomena :

  1. The monitor always keeps showing the NVIDIA Picture,No change ,.The print log is as follows:
    AGT08-nvidia_lglk.log (127.1 KB)

  2. Same as the previous question.The print log is as follows:
    AGT09-nvidia_lglk.log (66.6 KB)

Looking forward to your reply

Hi,

What is difference between AGT08 and AGT09? I see lots of CPU crash error on AGT08 case.

Hi
When the AGX module starts abnormally, there are two phenomena manifested ,The print information in the two states are AGT08 and AGT09respectively ,I don’t know if these two phenomena are caused by the same reason .

  1. After the AGX module is powered on, the display will show the NVIDIA icon, and then the display enters the power down mode, the UART stops printing without responding, and the system crashes ,The print result of the UART is AGT09 .

  2. After the AGX module is powered on, the display always stays on the Nvidia picture interface. The print result of the UART is AGT08.

Hi
If the two log files mislead you, I suggest only analyzing AGT09. This phenomenon occurs many times.

If the log always stuck at same location?

Is your carrier board design following up the product design guide? I would suggest to resolve this from hardware instead of software (analyze those logs).

Thanks for your reply.

My carrier board does not use MCU to control the timing of power and reset. I control the timing of AGX through CPLD. I think the two are the same in principle. Timing control refers to “Figure 5-10. Power-OFF to On Sequence Auto Power-On Case” in the datasheet.

image

When using an oscilloscope to test the power-on sequence, the results meet the requirements.

For the data sheet ,the “15.3 Strapping Pins”,I did not find the relevant pin description, including the meaning of the pin settings, etc. During power-up, do these pins need to maintain a special state? Where can I see these contents?

I compared these pins with the official development board and found that some pins were not processed accordingly. Will this cause problems with my board?

For strapping pins, please just follow the product design guide and do not change the pins design. It can affect system if don’t follow that.

Hi
There are no detailed instructions and meanings for the strap pins in the product guide. Is there a special document that gives a detailed introduction to these pins, including what state these pins should maintain during power-up.

As said in OEM Product Design Guide in DLC: The other straps mentioned in this section are for use on the module by NVIDIA only. Their state at power-on must not be affected by any connections on the carrier board. The carrier board design should guarantee a high-z on the pins during boot. The pins that are associated with SoC straps (besides FORCE_RECOVERY_N) are as follows.