PCIe C0 and C4 do not seem to work

Hello,
I have a custom carrier where I connected on C0 an i210 ethernet controller and on C4 a M.2 module.
I followed the online documentation for adaptation but it is quite different of what is actually in the reality. (could not find any pcie dtsi files)

lspci output was empty

I changed the device tree for the pcie@14180000 and pcie@14160000. I added the p2u_hsio_x (see attached the DTS). But I needed to create these labels by myself.

  • I changed the pinmux (made the pex_rst as output and clk_request bidirectional).
  • added the nvidia,disable-power-down property in the nodes.
    -ODMDATA changed to ODMDATA=“gbe-uphy-config-0,hsstp-lane-map-3,hsio-uphy-config-16,nvhs-uphy-config-0”;

see attached the output o the lspci now… it seems weird as the rootport is not really listed. And I do not think that the listed one is the C4…
Not sure for what kernel is the documentation written, but it seems there are some discrepancies. Could you help pinpoint what I am doing wrong?

What is weird is that right after powerup, the LEDs on the RJ45 connector conencted to the i210 controller light up and I also get a link with an external PC (although no communication). Similar bevair with the M.2 (I have an ACT LED that lights up at power-up). But after NVIDIA Logo and after it the message "Using DTB from configuration table ", it is like it is cutting power to everything. In the dmesg log there is a line “vdd-3v3-pcie: disabling”. Could be there something related tot his also?

Best regards,
C

orin-dmesg.log (62.5 KB)
DTS.txt (553.3 KB)
lspciorin.log (4.8 KB)

Hi,

I want to discuss about your question about c0 and c4 separately.

Let’s start from the easy part first.

C4 is already enabled in default BSP for devkit. How about you flash your module and put it on Orin AGX devkit first to validate whether C4 is enabled or not?

. And I do not think that the listed one is the C4…

The one listed in your lspci is C4. You added “nvidia,disable-power-down” so that it will appear in lspci even when the link is not up.

And for your device tree, are you sure the device tree running on the board is really matching to the log you attached?

The device tree you shared is

nvidia,dtbbuildtime = “Aug 1 2023\n:44:17”;

And the device tree running on your board is

[ 0.004219] DTB Build time: unknown

And the reason why C0 is directly not working is because the phys is not recognized…

6.326017] tegra194-pcie 14180000.pcie: Failed to get PHY: -19

Hi!

I will do that and come back with the result.

In the mean time see below my comments for the other issues:

I would say so… I copied and modified the tegra234-p3701-0000-as-P3701-0004-p3737-0000.dtb and used it in a separate <board>.conf file. I also checked some properties also in the /proc/device-tree to see if that is really the one. Maybe I should have modify that key to keep track of what is acually the DTB used at runtime.
But now that you say, I am not that sure anymore. In what folder is .conf looking for the DTB_FILE? I assumed it is in ./kernel/dtb.

What phys should I use? Is there a mapping for this available?

I also change the /boot/dtb/kernel_<board>.dtb for device tree debugging purposes.
I used fdtput and dtc to do some changes. Could this be why the DTB build time is unknown…?
Should I reflash everytime if I make a change only in the DTB?

P.S. I use JetPack 5.1.2

Regards,
Codrin

Hi,

There are lots of mistakes here.

  1. Why are you using " tegra234-p3701-0000-as-P3701-0004-p3737-0000.dtb "? You should use the one that got used on your devkit case. Such “xxx-as-xxx” dtb is just for experimental one. For example, it is for someone who does not have Orin 32GB module to use Orin devkit module to evaluate the performance.

  2. Do you know that there are kernel source code provided on the website and you should build the device tree from the source?
    https://developer.nvidia.com/embedded/jetson-linux-r3541

  3. The phys should match the document here.

https://docs.nvidia.com/jetson/archives/r35.4.1/DeveloperGuide/text/HR/JetsonModuleAdaptationAndBringUp/JetsonAgxOrinSeries.html?highlight=adaptation#enable-pcie-in-a-customer-cvb-design

Please be aware that the document is based on the source code mentioned by (2).

I guess the issue here is you don’t really know how to build device tree from our kernel source. Thus, better learning that first.

Hi,

Thanks for clarifying that.

Do you mean the $(KERNEL_TOP)/Documentation/devicetree/bindings/pci/nvidia,tegra194-pcie.txt?
I did not see any useful phy info in this document.
Is there information about phys in that? I am pretty sure I got the right source files.

just as info I also got a timeout from ./source_sync.sh.

Thanks

No, I am merely talking about the NVIDIA document… click into the link I shared and scrolled down the page…

There is nothing to study or understand. What you can changed is just fixed value. Nothing various that can be applied here.

OK, thanks.
The carrier that I was saying was made following the development kit, therefore we tried to use the same dtb/dtsi file(s), and only changed what was actually deviated (we worked in the past succesfully with XAVIER in this way). Some other post in this forum kind of suggested to do that. It this approach wrong and does it really make sense to build it from the sources?

You mentioned we should use the dt that we got used for the devkit, then that we need to build from sources. Could you please clarify this one more time?

Thanks

Hi,

My point is no one should use “tegra234-p3701-0000-as-P3701-0004-p3737-0000.dtb” for production.
If you don’t know what I am talking about, flash your module on devkit with sdkmanager and it will tell you which device tree is in use… It won’t be such “xxx-as-xxx” dtb.

This dtb is just for emulation. You shouldn’t use that for a custom board…

Also, it does not really matter how you generated device tree. If you are truly familiar with how device tree works, then you don’t need my guidance to fix the error you saw for C0…

However, as I don’t know you or your team. What I can guide is from the newbie level.

Hi,
Ok, I understood. We should not use tegra234-p3701-0000-as-P3701-0004-p3737-0000.dtb. We changed to the tegra234-p3701-0004-p3737-0000.dtb
If this is what makes you help us, then it is OK to treat us as newbies. Whatever works for you :)

Back to the problem.
If it does not matter how the device tree is generated, then the one used for the devkit can be successfully used for our case if we make the necessary changes. As we did for Xavier with an older jetpack.

So what we did till now:
-modified the ODMDATA to ODMDATA=“gbe-uphy-config-0,hsstp-lane-map-3,hsio-uphy-config-16,nvhs-uphy-config-0”;

  • changed the pinmux (made the pex_rst as output and clk_request bidirectional) and enabled the pins .
  • removed the associated entries with the rst and clkreq pins in the gpio dtsi.
  • started from tegra234-p3701-0004-p3737-0000.dtb
  • enabled pcie@14180000
  • as phy I added the phandle form p2u@03e00000 (0x361; also tried with 0x362) and name “phy-names” “p2u-0” (I think this is not OK… this is where the error "Failed to get PHY: -19) comes from.
    I am sure that the dt I created is actually the one running.
  • it seems that the nvidia,disable-power-down property does not work for C0.

Questions:

What is the correct phy to use with C0?
Is there anything related to the regulators?
As I already said, right after loading, the rst signals go low, effecively keeping both the i210 controller and the m.2 in reset. Is this configurable anywhere?
What are we missing here?

Thanks

What is the exact error log right now?

Everything you should modify is already listed on the document and lots of others have validated that before.

If you are still trying to modify from dtb-> dts directly, follow as what I said. Modify it from kernel source and do full flash to the board.

changed the pinmux (made the pex_rst as output and clk_request bidirectional) and enabled the pins .

I don’t care much detail. What you need to modify is already listed on document. It is just copy and paste

removed the associated entries with the rst and clkreq pins in the gpio dtsi.

Sounds great.

started from tegra234-p3701-0004-p3737-0000.dtb

Ok great.

enabled pcie@14180000

Ok.

as phy I added the phandle form p2u@03e00000 (0x361; also tried with 0x362) and name “phy-names” “p2u-0” (I think this is not OK… this is where the error "Failed to get PHY: -19) comes from

No one can know what you are doing here unless you share the code you are modifying.

it seems that the nvidia,disable-power-down property does not work for C0.

No such thing. Must be something wrong in the modification.

What is the correct phy to use with C0?

Already listed in document. Why is this still a question? What is the exact error now?

Is there anything related to the regulators?

Depends on what kind of error it is now, but generally not related.

As I already said, right after loading, the rst signals go low, effecively keeping both the i210 controller and the m.2 in reset. Is this configurable anywhere?

It sounds like just a common case that the link is not up.

What are we missing here?

Make sure your pinmux is really flashed to the board.
Make sure you really share the log file but not just tell something like “it is not working”.

Hi,

I recompiled the kernel form the sources (35.4.1) with the modifications to enable the node for C0. Copied the device files from kernel_out/arch/arm64/boot/dts/nvidia/to Linux_for_Tegra/kernel/dtb/
Already removed the K0,K1 and L0,L1 from Linux_for_Tegra/bootloader/tegra234-mb1-bct-gpio-p3701-0000-a04.dtsi
Already modified the pex_l1_clkreq_n_pk2 and pex_l1_rst_n_pk3 in the Linux_for_Tegra/bootloader/t186ref/BCT

Please see attached the log.
The error is “Phy link never came up”.
In the flash log the correct pinmux file seems to be flashed.

dmesg.log (66.9 KB)

Already removed the K0,K1 and L0,L1 from Linux_for_Tegra/bootloader/tegra234-mb1-bct-gpio-p3701-0000-a04.dtsi
Already modified the pex_l1_clkreq_n_pk2 and pex_l1_rst_n_pk3 in the Linux_for_Tegra/bootloader/t186ref/BCT

May I ask what you are doing here? No one is asking you to modify pex_l1_clkreq_n_pk2 and pex_l1_rst_n_pk3 . Why are these two things coming out?

And K0,K1 and L0,L1 should not be there in default pinmux at all.

I meant pex_l0*.
Does this have anything to do with the issue?

I don’t know whether the files you set are really correct or not as I totally didn’t see them.
If you are sure everything on the document is set and got flashed to the board, check the debug tips here.

https://docs.nvidia.com/jetson/archives/r35.4.1/DeveloperGuide/text/HR/JetsonModuleAdaptationAndBringUp/JetsonAgxOrinSeries.html?highlight=pcie#debug-pcie-link-up-failure

Also try other kind of pcie device on your C0. Sometimes it is device specific. If any pcie device could be detected on C0, then you don’t need to focus on the pinmux things or device tree anymore.

OK. See attached the files. Could you please comment on these?
tegra234-mb1-bct-pinmux-p3701-0000-a04.dtsi.txt (63.6 KB)
tegra234-gpio.h.txt (4.1 KB)
tegra234-mb1-bct-gpio-p3701-0000-a04.dtsi.txt (4.7 KB)

I will look into that.
thanks

I was able to make C0 work after unbinding and binding again. But is it necessary to do this everytime? How to make it permanently or at boot time?
C4 still does not work… see the dmesg output when trying to unbind and bind again. (please ignore the build time of the dtb, since I modify the disable-power-down property in the /boot/dtb/ after booting)
dmesg_c4_error.log (88.6 KB)

I already rebuild the uefi once and flashed. Could I do something in the edk2 source to debug this?
Is is curious that the UEFI recognizes the M.2 on C4…

Hi,

If you bind/unbind can make you detect on C0, then you don’t need to configure C0 device tree anymore.
The detection issue is from something else. For example, maybe your device needs specific GPIO to get enabled first. But it didn’t enable during boot so you need to wait for that gpio to be enabled for a while.

As for C4, I think the situation is same as default device tree already enabled C4. We never heard anyone need to spend time working on device tree change to enable C4.

Could you explain also why if removing nvidia,disable-power-down from C4, binding and unbinding C0 does not work anymore?

Could it be related to Wake signal?

As I told aready… as soon as the booting starts, the ORIN module asserts the RESET signal. Why? and how to stop this?