Jetson UEFI firmware hangs on custom carrier board

You can move the emmc module you are using now from your custom board to the devkit.

No. I can’t touch it

Why? Not able to access it or hardware design?

Not able to physically access it.
I have a custom board with emmc based xavier nx which I can’t physically reach and I have a devkit SD module

Then I could only say, you can try. I don’t guarantee they are same.

I tried and got the following:

  1. Xavier NX devkit (sd module) + sdk manager → works
  2. Xavier NX p3668-0001 (emmc module) on custom board + sdk manager + “default emmc conf” → not working with missing
  3. Xavier NX p3668-0001 (emmc module) on custom board + sdk manager + “default sd conf” → not working with missing

I looked for this property in the dt sources. I notices it appears only when t23x family soc kernel configuration is enabled. Before JP5.X we used t19x family soc with our custom board.

I tried to reconfigure the kernel to use the t19x family soc but the build process was failed.
My goal is to try “get rid” of this property to isolate the problem

Hi,

Actually, what I want you to try is enable UEFI debug build and see if NX devkit also has ParseGicMsiBase: cannot retrieve property ‘msi-parent’: FDT_ERR_NOTFOUND” error? or not.

I’ll check it now. Can I just reflash UEFI? I couldn’t find its partition in order to reflash it (I found onnly TBCFILE in the dev guide)

I checked and it does contain the msg:

boot.log
devkit_boot.log (214.8 KB)

Hi,

Just want to confirm again, is your custom board always got stuck right after “ParseGicMsiBase: cannot retrieve property ‘msi-parent’: FDT_ERR_NOTFOUND”?

Could you try to boot it for 10 times and confirm?

yes I can confirm that

Hi @BSP_User

To debug this issue, please disable every pcie@14xxxxxx node in your device tree and reflash it to your board.

Let’s see if disabling PCIe lanes could bypass this error or not first.

Please be aware that this is just debug. We are not asking you to totally not use pcie forever.

I can confirm that after disabling two pcie@14XXX properties (others were already disabled) the target booted up.
Regardless to pcie devices the ubuntu get stuck during the initial system configuration setup (using gui) so I don’t know if this behavior is related to the pci disable.

If I restart it the system configuration setup appears again and stuck again during its end.

I converted the tegra194-p3668-0001-p3509-0000.dtb into dts , changed the pcie@14xx status property fields to disabled and converted the dts back to tegra194-p3668-0001-p3509-0000.dtb and then re flashed

Please run l4t_creat_default_users.sh on your host machine first before doing flash. It will skip the system configuration.

I’m trying now to disable only one of the pcie@14xxxx and flash to try and isolate the exact “problematic” one.
you want me to skip that?

It is okay to keep trying what you are doing now.

If I disable only one of them and use the l4t_creat_default_users.sh then it boots successfully.

three questions:

  1. We bypassed the stuck during the initial system configuration, but it doesn’t necessarily means that the issue is not exists anymore. How can I address that? (from your experience , disabling pcie field in dtb can result in such behavior? or we need to look some place else for this issue)

  2. Is there anything important in that disabled pcie dtb field (crucial hardware/settings for the system) or I can use the system regularly without it?
    for example I do have an internal nvme which is connected through pcie and it recognized by the system.

the disabled pcie property:
disabled_pci (2.2 KB)

the whole dts:
current dts (386.1 KB)

  1. Why I’m using kernel t23x soc family configuration instead of just t19x settings? (you told me in another post to leave the t23x soc family setting on and not change it to t19x soc family)
    I’m asking that because it may interfere with my device tree and my custom board.

Thank you

Hi,

Sorry that I didn’t catch the point of this question. What t23x soc family are you talking about here?

Why I’m using kernel t23x soc family configuration instead of just t19x settings? (you told me in another post to leave the t23x soc family setting on and not change it to t19x soc family)

In the kernel configuration , via menuconfig, there is a setting: Tegra 18x/19x/21x/23x family SOC (downstream options).
Screenshot from 2022-09-13 13-10-27

By default it uses the 23x even tough our hardware is 19x.
So I assume it can bring unwanted device tree settings to the result dtb. Isn’t it?

you told me before to leave that and not change it, so that’s why I’m asking.

In addition to the system configuration stuck issue, I can’t perform sudo:

nx1@nx1:~$ sudo parted /dev/nvme0n1 mklabel gpt
sudo: /usr/bin/sudo must be owned by uid 0 and have the setuid bit set

I read that its because something is messed up with the files but this is a fresh install.