Devices under PCIE packet switch sometimes are not detected after system boots or reboots

Hi,
We designed a carrier board for Xavier NX/Nano/TX2 NX. There’s one M.2 M key connector on it.
Here’s the problem we faced.
Plug an NVME SSD on M key, all 3 SOMs work fine.
Plug a M.2 module with a Pericom packet switch PI7C9X2G303EL and 2 LAN controllers after switch on it, Xavier NX sometimes can’t detect the 2 LAN controllers. However, Nano and TX2 NX work fine.

We found some discussions similar to this problem,
PCIE-HUB chip not being detected on boot occasionally
Jetson xavier nx can't find PCIe Device sometime, can it support rescan?
Xavier NX hardware pcie connect pcie switch pm8561 problem

The solution is to downgrade host PCIE max speed from Gen 4 to Gen 2 or Gen 1. We tried it, and it seems to set Gen 2 can avoid this problem.

Here’s are the questions.

  1. Did NVIDIA get the root cause of this problem? To downgrade max speed is just a workaround not a solution.
  2. M.2 M key PCIE Gen 3 SSD is supported in our product, so we need to set the speed at Gen 3 or above for better performance. However, setting max speed at Gen 3, we will face the problem that devices after Pericom packet switch can’t be detected problem mentioned in the beginning. Did NVIDIA have solution for this?

Thanks
Wayne.

Hi,
It almost one week passed, any comment on this issue?

Wayne

Hi,
One more week passed, any update?

Wayne

If downgrade speed can solve the issue, it might mean your layout quality is not so good. Please follow the PCIe routing guideline in product design guide to check your layout. The trace length and impedance should be in the range as listed in it.

Hi Trumany,
Sorry, I think the thing you said may be not true.

  1. The packet switch PI7C9X2G303EL is just a PCIE Gen 2 switch.
  2. Giving long run test and it works fine for one day. However, running
    reboot test, it can’t get packet switch downstream devices just after
    several times(but upstream to host is fine, we can see PI7C9X2G303EL
    by lspci).
  3. We use another NVME module which supports Gen 3 and it works well
    at this speed.(both long run and reboot test)
  4. We’ve checked SI of this port and it can pass PCIE Gen 3 spec.
  5. This issue happens when Gen4 or Gen3 is set during system boot only.

Same thought as the one of the posts I mentioned,
I think that there may be a bug in the speed negotiation process
from max-speed=4 (or 3) in Jetson XavierNX.

Would you please help to get the root cause?
Thanks.

Wayne

Not sure about it, I can only give comment from HW side. Have you met same issue on devkit? If not, it might be the custom design problem.

Hi Trumany,
I’ve tried devkit. It failed 6 of 10 times reboot.
Do you still think this is a design or signal quality issue?

Thanks.
Wayne

Hello,

Is it possible to reproduce issue on devkit? If this is software issue, then devkit shall reproduce too.

Oh, sorry , missed your comment.

Could you share the failure log (dmesg) and lspci -vvv?

Hi WayneWWW,
Please check them.

Thanks.
Wayne

lspci.txt (7.2 KB)
dmesg.txt (61.1 KB)

Hi,

Just want to double confirm how you connect the device.

Are you saying that you connect a PCIE switch (Pericom package switch) to the M.2 key M slot and 2 LAN controllers on the downstream port of this pcie switch and sometimes it cannot get detect well?

Also, when those two devices get detected, will you see this error from the dmesg?

[ 108.916585] pcieport 0005:01:00.0: invalid short VPD tag 00 at offset 1

Hi WayneWWW,

  1. PCIE switch is on the M.2 M key module, it means the connection is Xavier NX–M.2 connector–PCIE switch–LAN controllerx2.
  2. Yes, 2 LAN controllers are on the downstream ports of this PCIE switch and sometimes both of them are lost.
  3. [ 108.916585] pcieport 0005:01:00.0: invalid short VPD tag 00 at offset 1 —> This message shows when I command lspci -vvv.
    → Yes, it also shows when the two LAN controller is detected.

Thanks.
Wayne

Hi WayneWWW,
Any update for us?

Thanks.
Wayne

Hi WayneWWW,
Two weeks have passed.
Any update?

Thanks.
Wayne

Hi WayneWWW,
We’ve posted this issue for almost 2 months, but there’s still no useful information or solution.
Is this problem possible to be solved?

Thanks.
Wayne

Hi WayneWWW,
Any update for this issue?
Thank you.

Wayne

Hi WayneWWW,
Till now, this problem is already posted for almost 3 months, no useful information for us to solve this issue.
Is it Xavier NX’s problem which can’t be solved?

Thanks.
Wayne

Hi,

In lspci output I see only switch upstream port detected. To enumerate LAN endpoint, switch downstream port should also be detected. Link between Tegra PCIe RP and switch upstream port is fine, now it is up to switch to make downstream ports & EPs available to Tegra. Jetson nano & TX2 are Gen2 capable and also NX with max cap Gen2 is working. Can you check if this switch works with any other Gen3 host?

Thanks,
Manikanta

Hi Manikanta,
Yes, we’ve tried this module on Intel platform which support PCIE Gen3, and it works well.
It issue does not just happen on our module, on the forum we can find that other people faced this problem, too.

Thanks.
Wayne