Jetson Orin NX - PCIe TX Signal Strength Tuning

Hi,

We are performing environmental temperature testing on Jetson Orin NX with PCIe Gen4 NVMe storage at 70°C. During testing, we observed AER (Advanced Error Reporting) corrected errors on the PCIe link:

pcieport 0004:00:00.0: AER: Corrected error received: 0004:01:00.0
nvme 0004:01:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
nvme 0004:01:00.0: device [1987:5018] error status/mask=00000001/00006000
nvme 0004:01:00.0: [ 0] RxErr (First)

The NVMe vendor analyzed the issue and indicated that the Orin NX’s PCIe TX signal amplitude appears weak.

For Jetson Xavier NX, we understand Nvidia does not support PCIe parameter tuning and only provides compliance/routing guidance.

Questions:

  1. Is the same policy applied to Jetson Orin NX regarding PCIe TX drive strength adjustment?
  2. Are there any register-level or device tree parameters available for PCIe PHY tuning on Orin NX?
  3. If tuning is not supported, are there updated PCIe routing guidelines in the Design Guide to address signal integrity concerns?

Thank you.

*** Please note that this reply is generated by LLM automatically ***

Based on the provided context, I will attempt to answer the question.

The question is about the Jetson Orin NX’s PCIe parameter tuning policy, specifically regarding the TX drive strength adjustment. The user is experiencing issues with PCIe Bus Errors during environmental temperature testing with PCIe Gen4 NVMe storage at 70C. The NVMe vendor has indicated that the Orin NX’s PCIe TX signal amplitude appears weak.

Comparing the situation to the Jetson Xavier NX, the user understands that Nvidia does not support PCIe parameter tuning and only provides compliance/routing guidance for the Xavier NX.

For the Jetson Orin NX, the policy regarding PCIe parameter tuning is not explicitly stated in the provided context. However, based on the information available, it appears that the Orin NX does have some level of PCIe tuning capabilities.

In the “Jetson Orin NX Series and Jetson Orin Nano Series Modules Tuning and Compliance Guide” (DA-11267-001_v1.2), there is a section on “PCIe compliance testing” which might imply that some level of tuning is possible. Additionally, in the “Jetson Orin NX Series and Jetson Orin Nano Series Modules Thermal Design Guide” (TDG-11127-001_v1.5), there is a mention of “PCIe Controller Configuration” which could potentially include tuning options.

However, without explicit information on the availability of register-level or device tree parameters for PCIe PHY tuning on Orin NX, it is difficult to confirm whether tuning is supported.

Regarding updated PCIe routing guidelines in the Design Guide to address signal integrity concerns, there is no explicit information available in the provided context.

In summary, while there is some indication that the Jetson Orin NX might have some level of PCIe tuning capabilities, the policy regarding PCIe parameter tuning is not explicitly stated. It is recommended to consult the official Nvidia documentation or contact Nvidia support for more information on this topic.

Comparison to the Jetson Xavier NX:

  • The Jetson Xavier NX does not support PCIe parameter tuning, and Nvidia only provides compliance/routing guidance.
  • The Jetson Orin NX might have some level of PCIe tuning capabilities, but the policy is not explicitly stated.

Recommendation:

  • Consult the official Nvidia documentation or contact Nvidia support for more information on PCIe parameter tuning for the Jetson Orin NX.
  • Review the “Jetson Orin NX Series and Jetson Orin Nano Series Modules Tuning and Compliance Guide” and “Jetson Orin NX Series and Jetson Orin Nano Series Modules Thermal Design Guide” for any potential tuning options or guidelines.

*** Please give the thumbs up if you get answers you like. Or provide feedback to help us improve the answer. ***

Additionally, I have the EOM Y-Readings:
PCIe0 lane 0-3 are
0x0000002e
0x0000001a
0x0000001b
0x00000032
What do these values indicate?

Could you confirm your design meets the design guide requirements for insertion loss, trace lengths/delays, impedance, trace spacing etc…?

For Eye Opening Monitor (EOM) description and interpretation of results, please refer to Jetson AGX Orin Series Modules Tuning and Compliance Guide https://developer.nvidia.com/downloads/assets/embedded/secure/jetson/agx_orin/jetson_agx_orin_series_tuning_compliance_guide/

Yes, we already confirm impedance, insertion loss, trace lengths and spacing meet design guide requirements.

Could you help clarify “The NVMe vendor analyzed the issue and indicated that the Orin NX’s PCIe TX signal amplitude appears weak”? What is your insertion loss for this PCIe link?

NV PDG is used EM-370(Z) PCB material, the length should be under 185mm(7.28 inches); Our project is used IT-170GRA1, its Dk & Df are better than EM-370(Z) and our length of PCIe signals are controlled around 5 inches. So we think our signal length or insertion loss should be fine.

Our idea want to try to adjust the TX signal strength, due to we want to try if the PCIe error will disappear.

We have reviewed the Tuning and Compliance Guide. Our EOM-Y measurements exceed the target value of 0xC for PCIe Gen4.

Do these readings indicate that our board’s PCIe signal integrity meets the specification requirements?

Thank you.

Could you share the NVMe part number and the vendor? Is the NVMe soldered on the same carrier board PCB or is there a way to breakout the PCIe lanes for testing? Please provide PCIe topology.

PCIe topology as below. NVMe P/N is ASDEM1EII-4TBT0(E18DI, provide by Phison)

Have you run PCIe compliance testing on the design and does it Pass and meet the requirements? Also, could you share the findings/report from the NVMe vendor as why they think the PCIe TX signal amplitude is low?

Also, we would like to get clarity on below
Whether the issue is with Orin NX or with NVMe?
Do we see this issue with different parts (i.e. different Orin NX & NVMe)? Or is it only with one specific NVMe part?
Does limiting the link speed fix the issue?

We have run PCIe compliance(Gen3/4) for TX and RX and get all pass result but we don’t get log file from NVMe vendor, we only get TX signal is weak information from vendor. We try to change different NVMe parts with Orin NX but it still happen PCIe error. When we limit PCIe link speed to Gen3, the issue will not happen.

Are the NVMe parts you have tried out from the same vendor or different vendors?

It comes from the same vendor

Although the PCIe RX compliance passed, according to the association’s definition, only Lane 0 needs to be measured. So we further measured Lanes 1 to 3 and found that only Lane 2 could not be measured, but Lanes 1 and 3 could be measured normally and all passed. Are there any settings that need to be noted?

We have tested all the lanes internally and we have not seen any variation in the margins.

We tried two carrier boards and the symptom are same. PCIe gen4 RX lane 0/1/3 can enter loopback mode, but lane 2 always can’t. So we can’t perform the signal integrity of PCIe gen4 RX lane 2. We also try to downgrade PCIe speed from gen4 to gen1, but RX lane 2 still can’t enter loopback mode. This symptom let us to think it seems not like as a HW issue, gen1 speed is lower and doesn’t need rigorous layout routing, but RX lane 2 still can’t enter loopback.

May I know if there are any SW setting for RX lane 2 to enter loopback mode? Or, maybe actually nvidia no support RX lane 2 can enter loopback mode?

Regardiing my questions as above, any update from Nvidia side?

Hello, we also tried to perform the signal integrity of PCIe gen4 RX lane 2 by Nvidia CRB today then it also same as ours that can’t enter loopback mode. Seems it’s NV module’s limitation and could you please explain why RX lane2 can’t enter loopback mode?

Hello, I am still waiting the reason why PCIe RX lane 2 can’t enter loopback for signal integrity. Please help to confirm.