Jetson TX2 custom board PCIe problem

Hello,

we have designed a custom motherboard(board1) to test Jetson TX2 capabilities. The board has two COMe connectors. These connectors let two COMe board to communicate over PCIe in 1 lane. There is no PCIe switch between them, crossed direct connection.

We have another custom board(board2) which supplies power and COMe connector to JetsonTX2. We can use ethernet and UART of JetsonTX2 on this board.

When we connect board2 to board1 to communicate with another COMe linux platform, we see error message on the console below.

[   33.939811] pcieport 0000:00:01.0:    [ 0] Receiver Error         (First)
[   33.939813] pcieport 0000:00:01.0:    [ 8] RELAY_NUM Rollover
[   33.939815] pcieport 0000:00:01.0:    [12] Replay Timer Timeout
[   33.942361] pcieport 0000:00:01.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=0008(Receiver ID)
** 78 printk messages dropped ** [   33.959548] pcieport 0000:00:01.0:   device [10de:10e5] error status/mask=00000001/00002000
** 11 printk messages dropped ** [   33.971138] pcieport 0000:00:01.0:   device [10de:10e5] error status/mask=00000001/00002000
** 20 printk messages dropped ** [   33.992886] pcieport 0000:00:01.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=0008(Transmitter ID)
** 26 printk messages dropped ** [   34.001711] pcieport 0000:00:01.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=0008(Receiver ID)
[   34.001713] pcieport 0000:00:01.0:   device [10de:10e5] error status/mask=00000001/00002000
** 130 printk messages dropped ** [   34.068682] pcieport 0000:00:01.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=0008(Receiver ID)
[   34.068684] pcieport 0000:00:01.0:   device [10de:10e5] error status/mask=00000081/00002000
[   34.068686] pcieport 0000:00:01.0:    [ 0] Receiver Error         (First)
** 119 printk messages dropped ** [   34.144307] pcieport 0000:00:01.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=0008(Receiver ID)
[   34.144309] pcieport 0000:00:01.0:   device [10de:10e5] error status/mask=00000081/00002000
** 16 printk messages dropped ** [   34.154126] pcieport 0000:00:01.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=0008(Receiver ID)
[   34.155753] pcieport 0000:00:01.0:   device [10de:10e5] error status/mask=00000001/00002000
[   34.155755] pcieport 0000:00:01.0:    [ 0] Receiver Error         (First)
[   34.165456] pcieport 0000:00:01.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=0008(Receiver ID)
[   34.165458] pcieport 0000:00:01.0:   device [10de:10e5] error status/mask=00000001/00002000
[   34.166402] pcieport 0000:00:01.0:    [ 0] Receiver Error         (First)
[   34.167541] pcieport 0000:00:01.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=0008(Receiver ID)
[   34.168257] pcieport 0000:00:01.0:   device [10de:10e5] error status/mask=00000001/00002000
[   34.168779] pcieport 0000:00:01.0:    [ 0] Receiver Error         (First)
[   34.170437] pcieport 0000:00:01.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=0008(Receiver ID)
[   34.170439] pcieport 0000:00:01.0:   device [10de:10e5] error status/mask=00000001/00002000
** 33 printk messages dropped ** [   34.220100] pcieport 0000:00:01.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=0008(Receiver ID)

What can be the problem?
How can I force JetsonTX2 to communicate on x1 gen1 configuration?

Thank you.
Fatih

Hi,

Are you using rel-28 or rel-32?

Hi,

we are using rel-32

By the way, pcie lanes and traces of custom board (board1) are validated by making two come ti dsp boards communicate on this board1.

What is the idea here exactly? Jetson-TX2 has only PCIe root ports. They can’t work in endpoint mode. And you mentioned that there is no PCIe switch also in between to enable NTB mode to connect two root ports back to back.

Let me clarify our setup and goal.

Jetson-TX2 is root complex (which is plugged on board 2) and other come_imx_board is end point. They ( Jetson-TX2 plugged on board2 and come_imx) are plugged on custom motherboard(board1). The aim of this configuration is to validate PCIe lanes traces connections of the board 2.

After the validation of board2 we are going to plug come_Jetson-TX2 (Jetson-TX2 plugged on board2) to another complex board which has PCIe switch, end point fpga and ti dsp.

I hope I am clear.

By the way, in this scenario one dsp is root complex, the other one is end point as they must be.

Ok. I think I understood it now. Basically, board-1 has PCIe lanes routing and nothing else. You are connecting both root port (TX2) and endpoint to board-1.
All those errors mentioned in comment #1 are of type Physical layer. So, they show that signal integrity is not good in this setup. If you are observing only physical layer errors and not any other type and you are OK to live with these signal integrity issues, you can disable AER reporting to avoid polluting the log. The downside of this is that there could be some impact in the perf as transactions would be retried over and over again because of bad link quality.

All we observe is this message dumb constantly on the console. We see physical layer errors on the message. Do you think that there are other type of errors according to this log? We cannot use the console due to this message flow.

How can we disable AER reporing?