HI, nvidia team:
The pcie of nx will error and reboot, the uart debug log as follow:
��[ 28.35199��safere��5] igb 0004:07:00.��g_poll_timer_cb��0 eth1: PCIe li��: poll inter��nk los��val��t, device no�� 106 above t��w deta��argd
��et 100
safereg_poll_timer_cb: poll interval 446 above target 100
��[ 30.022243] igb 0004:07:00.0 eth1: malformed Tx packet detected and dropped, LVMMC:0xfffff��saf��fff
��ereg_poll_timer_cb: poll interval 103 above target 100
safereg_poll_timer_cb: poll interval 103 above target 100
��[ 36.823048] bpmp: mrq 22 took 3996000 us
[ 36.825335] pcieport 0004:00:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=0000(Transmitter ID)
[ 36.828440] pcieport 0004:00:00.0: device [10de:1ad1] error status/mask=00009001/0000e000
[ 37.028126] igb 0004:07:00.2 eth3: PCIe link lost, device now detached
[ 37.079694] igb 0004:07:00.1 eth2: PCIe link lost, device now detached
[ 37.135140] pcieport 0004:00:00.0: [ 0] Receiver Error (First)
[ 37.137026] pcieport 0004:00:00.0: [12] Replay Timer Timeout
[ 37.138997] pcieport 0004:00:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, id=0000(Requester ID)
[ 37.141196] pcieport 0004:00:00.0: device [10de:1ad1] error status/mask=00004020/00400000
[ 37.143122] pcieport 0004:00:00.0: [ 5] Surprise Down Error
[ 37.145038] pcieport 0004:00:00.0: [14] Completion Timeout (First)
[ 37.593029] igb 0004:07:��safereg_poll_timer��00.3 eth4: PCIe li��_cb: poll interval 106 above target 100
��nk lost, device now detached
��safereg_poll_timer_cb: poll interval 113 above target 100
safereg_poll_timer_cb: poll interval 106 above target 100
��[ 39.543876] igb 0004:07:00.2 eth3: malformed Tx packet detected an��safereg_poll��d d��_ti��rop��mer��ped, LVMMC:0xff��_cb: poll in��ffffff
��terval 115 above target 100
��[ 39.708664] igb 0004:07:00.1 eth2: malformed Tx packet��safereg_poll_timer_cb: poll interval 292 abov�� detected an��e target ��d dropped, f
��100
safereg_poll_timer_cb: poll interval 207 above target 100
safereg_poll_timer_cb: poll interval 119 above target 100
safereg_poll_timer_cb: poll interval 103 above target 100
safereg_poll_timer_cb: poll interval 105 above target 100
safereg_poll_timer_cb: poll interval 120 above target 100
safereg_poll_timer_cb: poll interval 106 above target 100
safereg_poll_timer_cb: poll interval 128 above target 100
��[ 57.983221] INFO: rcu_preempt self-detected stall on CPU[ 57.983232] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 57.983252] 4-...: (1 GPs behind) idle=8cd/140000000000001/0 softirq=5928/5931 fqs=313
[ 57.983255]
[ 57.990164] 4-...: (1 GPs behind) idle=8cd/140000000000001/0 softirq=5928/5931 fqs=315
[ 57.990173] (t=5266 jiffies g=553 c=552 q=10466)
[ 58.248160] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 58.248176] 4-...: (1 GPs behind��safereg_poll_timer_cb: poll in��) idle=8cd/1400000��terval 112 abov��00000001/0 softirq=5930/593��e target
��1 fqs=269
��safereg_poll_timer_cb: poll interval 196 above target 100
��[ 58.248184] (detected by 0, t=5253 jiffies, g=111, c=110, q=59)
��safereg_poll_timer_cb: poll interval 108 above target 100
safereg_poll_timer_cb: poll interval 106 above target 100
safereg_poll_timer_cb: poll interval 118 above target 100
safereg_poll_timer_cb: poll interval 109 above target 100
��[ 71.088017] pcieport 0004:00:00.0: PCIe Bus Error: severity=Uncorrect��safereg_poll_timer_cb��ed (Non-Fatal),��: poll interval�� type=Trans
��uester ID)
��safereg_poll_timer_cb: poll interval 226 above target 100
��[ 71.550316] pcieport 0004:00:00.0: device [10de:1ad1] error status/mask=00004020/00400000
[ 71.858412] pcieport 0004:00:00.0: [ 5] Surprise Down Error
��safereg_poll_timer_cb: poll interval 112 above target 100
��[ 72.115160] pcieport 0004:00:00.0: [14] Completion Timeout (First)
��safereg_poll_timer_cb: poll interval 130 above target 100
safereg_poll_timer_cb: poll interval 116 above target 100
could you tell me what show i do?
Thanks.