Cbb-fabric TIMEOUT_ERR at shutdown

Environment

L4T 35.4.1
Orin NX based CustomBoard

As a result of updating from R35.3.1 to R35.4.1, the following error occurs during shutdown.

# shutdown -h now

Session terminated, killing shell...[  298.034503] CPU:0, Error: cbb-fabric@0x13a00000, irq=25
[  298.039895] **************************************
[  298.044829] CPU:0, Error:cbb-fabric, Errmon:2
[  298.049319]    Error Code            : TIMEOUT_ERR
[  298.053360]    Overflow              : Multiple TIMEOUT_ERR
[  298.058032]
[  298.059562]    Error Code            : TIMEOUT_ERR
[  298.063602]    MASTER_ID             : CCPLEX
[  298.067100]    Address               : 0x2a080082
[  298.070780]    Cache                 : 0x1 -- Bufferable
[  298.075086]    Protection            : 0x2 -- Unprivileged, Non-Secure, Data Access
[  298.082075]    Access_Type           : Read
[  298.085567]    Access_ID             : 0x17
[  298.085569]    Fabric                : cbb-fabric
[  298.092467]    Slave_Id              : 0x14
[  298.095692]    Burst_length          : 0x0
[  298.099188]    Burst_type            : 0x1
[  298.102507]    Beat_size             : 0x1
[  298.105734]    VQC                   : 0x0
[  298.108521]    GRPSEC                : 0x7e
[  298.111575]    FALCONSEC             : 0x0
[  298.114811]  **************************************
[  298.120029] WARNING: CPU: 0 PID: 101 at drivers/soc/tegra/cbb/tegra234-cbb.c:578 tegra234_cbb_isr+0x134/0x180
[  298.130628] ---[ end trace e109e362407d590f ]---
[  298.135446] CPU:0, Error: cbb-fabric@0x13a00000, irq=25
[  298.140810] **************************************
[  298.145735] CPU:0, Error:cbb-fabric, Errmon:2
[  298.150216]    Error Code            : TIMEOUT_ERR
[  298.154241]    Overflow              : Multiple TIMEOUT_ERR
[  298.158905]
[  298.160435]    Error Code            : TIMEOUT_ERR
[  298.164469]    MASTER_ID             : CCPLEX
[  298.167963]    Address               : 0x2a080082
[  298.171635]    Cache                 : 0x1 -- Bufferable
[  298.175930]    Protection            : 0x2 -- Unprivileged, Non-Secure, Data Access
[  298.182914]    Access_Type           : Read
[  298.186405]    Access_ID             : 0x14
[  298.186407]    Fabric                : cbb-fabric
[  298.193308]    Slave_Id              : 0x14
[  298.196529]    Burst_length          : 0x0
[  298.200019]    Burst_type            : 0x1
[  298.203335]    Beat_size             : 0x1
[  298.206560]    VQC                   : 0x0
[  298.209343]    GRPSEC                : 0x7e
[  298.212395]    FALCONSEC             : 0x0
[  298.215626]  **************************************
[  298.220774] WARNING: CPU: 0 PID: 101 at drivers/soc/tegra/cbb/tegra234-cbb.c:578 tegra234_cbb_isr+0x134/0x180
[  298.231278] ---[ end trace e109e362407d5910 ]---
[  298.669155] nvgpu: 17000000.ga10b      ga10b_intr_log_pending_intrs:306  [ERR]  Pending TOP[0]: 0x00000004, LEAF[4]: 0x11000000
[  298.682004] arm-smmu 8000000.iommu: disabling translation
[  298.687734] arm-smmu 10000000.iommu: disabling translation
[  298.693435] arm-smmu 12000000.iommu: disabling translation
[  298.729137] CPU1: shutdown
[  298.749035] CPU2: shutdown
[  298.768747] CPU3: shutdown
[  298.77籬3198] reboot: P瞋hutdown state requested 0
Shutting down syste

Address : 0x2a080082 is、

PCIE_C7_32BIT_DMA 0x28040000 0x2807ffff SYSTEM
PCIE_C8_32BIT     0x2a000000 0x2bffffff SYSTEM
PCIE_C8_32BIT_EP  0x2a000000 0x2a001fff SYSTEM_CFG.PCIE_C8_CTL.PCIE_RP_A PPL_DM_TYPE_0.DEVICE_TYPE.END_PO INT
PCIE_C8_32BIT_RP  0x2a000000 0x2a001fff SYSTEM_CFG.PCIE_C8_CTL.PCIE_RP_A PPL_DM_TYPE_0.DEVICE_TYPE.ROOT_P ORT
PCIE_C8_32BIT_DMA 0x2a040000 0x2a07ffff SYSTEM
PCIE_C9_32BIT     0x2c000000 0x2dffffff SYSTEM

The problem appears to be related to PCIE_C8 BUS.

# lspci
0001:00:00.0 PCI bridge: NVIDIA Corporation Device 229e (rev a1)
0001:01:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
0004:00:00.0 PCI bridge: NVIDIA Corporation Device 229c (rev a1)
0004:01:00.0 Non-Volatile memory controller: Device 1bc0:1002 (rev 01)
0007:00:00.0 PCI bridge: NVIDIA Corporation Device 229a (rev a1)
0007:01:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
0008:00:00.0 PCI bridge: NVIDIA Corporation Device 229c (rev a1)
0008:01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)
0009:00:00.0 PCI bridge: NVIDIA Corporation Device 229c (rev a1)
0009:01:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)

To confirm, we disabled PCIE C8 BUS at DTB and no error occurred.

	pcie@140a0000 {/* C8 */
		status = "disabled";
	};

The following commits seem to be kernel changes in R35.3.1 → R35.4.1.

[drivers/pci/controller/dwc/pcie-tegra194.c]

commit d8913655d85710f9987bb47885f9fb2c14ccb12e
Author:     Manikanta Maddireddy <mmaddireddy@nvidia.com>
AuthorDate: Thu Feb 2 15:31:37 2023 +0530

    Revert "PCI: tegra194: Reduce AXI slave timeout value"

    This reverts commit 492afa913684df8f212b44e16b819afc57b2818d.
    CBB timeout for PCIe is reverted back to default value, 64 msec.
    So, there is no need for PCIe completion timeout to be low(<10msec).
    Revert the PCIe completion timeout back to default value.

    Bug 4017244

However, Revert of this Commit did not resolve the problem.
Could you please let me know if there is a workaround for this TIMEOUT error? Thanks.

20230824_shutdown_cbb_timeout_verbose_log.txt (117.1 KB)

Hi,

can this issue be re-produced on a DevKit?
(Either Xavier NX DevKit or Orin Nano DevKit)
PCIe C8 bus comes directly from the SoM, and should have little to do with the board design.

Hi,

We checked it in Orin Nano DevKit (L4T 35.4.1), but problem didn’t reproduced.

About PCIe C8 bus, we doesn’t change any dtb or driver.

20203825_OrinNano_EVK_shutdown_verbose_log.txt (95.3 KB)

Hi,

is it 100% reproducible on your custom board?
Or it happens randomly?

So far, it is 100%.

Then can you please check your hardware design?
If it cannot be reproduced on DevKits, then it’s likely you did something wrong with your carrier board.

If it cannot be reproduced on DevKits, then it’s likely you did something wrong with your carrier board.

PCIe C8 bus is internally connected with R8168 ether phy in SoM.
We do not output C8 BUS to carrier board.

# lspci
0001:00:00.0 PCI bridge: NVIDIA Corporation Device 229e (rev a1)
0001:01:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
0004:00:00.0 PCI bridge: NVIDIA Corporation Device 229c (rev a1)
0004:01:00.0 Non-Volatile memory controller: Device 1bc0:1002 (rev 01)
0007:00:00.0 PCI bridge: NVIDIA Corporation Device 229a (rev a1)
0007:01:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
0008:00:00.0 PCI bridge: NVIDIA Corporation Device 229c (rev a1)
0008:01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)
0009:00:00.0 PCI bridge: NVIDIA Corporation Device 229c (rev a1)
0009:01:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)

Hi,

Please also switch other modules to check if this happens to every module or only one module.

Hi,

We have OrinNX16GB/8GB, OrinNano8GB/4GB modules.
And this problem occurs in all of them.

Then I can only suggest to review hardware design with devkit.

This issue only got reported on your board but not anyone else so far.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.