Orin NX network and NVMe performance issues

I’m running some network and NVMe benchmarks using a Orin NX system running JetPack 5.1.2.

The setup consist of a PCIe switch installed on the Gen4 x4 M.2 PCIe slot. The PCIe switch is connected to dual port 25GbE Mellanox card, and 6 NVMe SSDs on the other PCIe switch downstream port.

The NX system is able to receive data on the two 25Gbps ethernet ports at an aggregated rate of around 6000 MB/s. The NX system is also capable of writing out to the NVMe SSDs at a rate of 6000 MB/s. However, when I try receiving data over the 25GbE network interfaces at the same time as writing out to the NVMe SSDs, the aggregate network receive rate drops to around 3500 MB/s. The CPU utilization is only around 50%, so I’m not sure why the NVMe SSD write activity is slowing down the network receive rate so much. The NX system is able to send and receive over the two 25GbE interfaces at 6000 MB/s in both directions simultaneously, so it doesn’t appear to be a PCIe bus or memory limitation.

I don’t understand why the network rates slow down so much with NVMe activity when there appears to plenty of free CPU resources available. Interrupts appear to be getting distributed fairly evenly between the 8 CPUs.

~$ cat /proc/interrupts
           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7

311:          6          0          0          0          0          0          0          0       MSI 541065216 Edge      nvme0q0
312:          6          0          0          0          0          0          0          0       MSI 541589504 Edge      nvme1q0
313:          6          0          0          0          0          0          0          0       MSI 542638080 Edge      nvme3q0
314:     628763          0          0          0          0          0          0          0       MSI 541065217 Edge      nvme0q1
315:          6          0          0          0          0          0          0          0       MSI 543162368 Edge      nvme4q0
316:         43          0          0          0          0          0     628338          0       MSI 541589505 Edge      nvme1q1
317:          6          0          0          0          0          0          0          0       MSI 544735232 Edge      nvme5q0
322:         40          0          0       1511          0     626763          0          0       MSI 542638081 Edge      nvme3q1
323:         40          0     628135          0          0          0          0          0       MSI 543162369 Edge      nvme4q1
324:        391          0          0          0          0          0     628765          0       MSI 544735233 Edge      nvme5q1
329:          6          0          0          0          0          0          0          0       MSI 542113792 Edge      nvme2q0
330:         53          0          0          0     628198          0          0          0       MSI 542113793 Edge      nvme2q1

350:          0          0          5          0          0          0          0    7663880       MSI 539492362 Edge      mlx5_comp10@pci:0004:05:00.0

363:          0    7909327          3          0          0          0          0      31718       MSI 539494410 Edge      mlx5_comp10@pci:0004:05:00.1

I have already tried running jetson_clocks and setting the nvpmodle to 0 for max power. Is there anything else I can do to help improve performance?

Hi,
There are 6 NVMe SSD cards + 1 Mellanox Ethernet card connecting to the PCIe interface. Concurrent access through PCIe switch looks to be the cause of performance drop. Are you able to try 6 NVMe SSD cards in one PCIe interface and 1 Mellanox Ethernet card in the other?

And a known issue in PCIe switch is
[35.3.1] Intel I210 throughput issue on Jetson Orin Nano Devkit - #12 by sgidel
Please try the patch:
Boot stuck while enumerating NVMe via PFX switch, seems to be PCIe driver issue - #8 by WayneWWW

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.