默认配置下出现 PCIE BUS Error 导致死机,降低为PCIE gen1 后不会死机,但仍有PCIE BUS Err报错

软件环境:36.3
硬件:自定义载板,jeston agx orin 32G/64GB核心板
pcie设计:使用pcie UPHY1 接 I350-AM4网卡
板卡信息如下:

root@ubuntu:~# cat /proc/version
Linux version 5.15.136-rt-tegra (dev@8e8fbb755e79) (aarch64-buildroot-linux-gnu-gcc.br_real (Buildroot 2022.08) 11.3.0, GNU ld (GNU Binutils) 2.38) #1 SMP PREEMPT_RT Fri May 9 06:12:11 UTC 2025

Uphy配置为默认配置

ODMDATA=“gbe-uphy-config-22,hsstp-lane-map-3,nvhs-uphy-config-0,hsio-uphy-config-0,gbe0-enable-10g”; CHIPID=0x23; ITS_FILE=; OVERLAY_DTB_FILE=“L4TConfiguration.dtbo,tegra234-p3701-overlay.dtbo”; CMDLINE_ADD=“mminit_loglevel=4 console=ttyTCU0,115200 console=ttyAMA0,115200 firmware_class.path=/etc/firmware fbcon=map:0 net.ifnames=0 nospectre_bhb video=efifb:off console=tty0” target_board=“generic”;

可以通过lspci看到PCIE 网卡已被加载

root@ubuntu:~# lspci
0001:00:00.0 PCI bridge: NVIDIA Corporation Device 229e (rev a1)
0005:00:00.0 PCI bridge: NVIDIA Corporation Device 229a (rev a1)
0005:01:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
0005:01:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
0005:01:00.2 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
0005:01:00.3 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)

可以看到当前已经降为gen1速率

root@ubuntu:~# lspci -vvv | grep LnkSta
LnkSta: Speed 2.5GT/s (ok), Width x4 (downgraded)
LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete- EqualizationPhase1-
LnkSta: Speed 2.5GT/s (downgraded), Width x4 (ok)
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
LnkSta: Speed 2.5GT/s (downgraded), Width x4 (ok)
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
LnkSta: Speed 2.5GT/s (downgraded), Width x4 (ok)
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
LnkSta: Speed 2.5GT/s (downgraded), Width x4 (ok)
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-

dts文件配置如下,配置为gen1

  pcie@141a0000{
  	vpcie3v3-supply = <&vdd_3v3_i350>;
  	/delete-property/ vpcie12v-supply;
  	max-link-speed = <0x1>;
  };

lspci -vvv 详细信息如下:
lspci-vvv-show-log.txt (21.8 KB)
开机log如下:
当PCIE配置速录为gen1时候,有少许PCIE BUS Error 报错,LOG如下
PCIE BUS Err Gen1.txt (184.8 KB)

当PCIE配置速录为gen2
的时候,会有大量PCIE BUS Error报错, 导致死机重启,无法进入登录界面, LOG 如下
sku2_pcie_bus_err_32g_pcle_gen2_core_dump.txt (261.5 KB)

硬件连接如下,SOM UPHY1与i350-AM4 连接

硬件SI测试结果如下,测试显示通过

Does the same PCIe card also hit this issue if testing on NV devkit?

我们没有PCIE 设备在 devkit 上进行测试, 所以没有遇到这个问题

Then nothng we can check here.

Maybe you could get a pcie analyzer and dump some traces.

上文回复使用pcie analyzer 这个是个软件工具,还是类似示波器一样的工具呢

这里我在exlinux.conf文件中加了pci=nomsi 参数,报错明显消失,但仍有其他问题,想咨询下这个参数的添加会有什么影响

LABEL primary
MENU LABEL primary kernel
LINUX /boot/Image
INITRD /boot/initrd
FDT /boot/kernel_tegra234-p3737-0000+p3701-0005-robot.dtb
OVERLAYS /boot/tegra234-p3737-0000+robot-0000-camera.dtbo,/boot/tegra234-p3737-camera-isx031-overlay.dtbo
APPEND ${cbootargs} root=/dev/mmcblk0p1 rw rootwait rootfstype=ext4 mminit_loglevel=4 console=ttyTCU0,115200 console=ttyAMA0,115200 firmware_class.path=/etc/firmware fbcon=map:0 net.ifnames=0 nospectre_bhb video=efifb:off console=tty0 pci=nomsi

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.