PCIe IRQ latency unbounded value

nicolas.bon · December 7, 2020, 1:24pm

I’m using the Jetson AGX as a root port and a Xilinx dev board as PCIe endpoint.
PCIe connection is well established and data correctly exchanged.
For my application, about 20 kB of data must be read from FPGA DDR to Jetson every milliseconds.

I configured a user IRQ sent by the endpoint over PCIe to inform Jetson that data are available.
I’m using MSI-X interrupts.
By default, IRQ affinity is setup to the CPU3 core of the Jetson AGX.

Here is a view of the IRQ associated with my PCIe driver :

cat /proc/interrupts | grep xdma
820: 0 0 0 0 0 0 0 0 PCI-MSI 0 Edge xdma
821: 0 0 0 0 0 0 0 0 PCI-MSI 1 Edge xdma
822: 144097 0 0 0 0 0 0 0 PCI-MSI 2 Edge xdma

IRQ #822 is the one raised when a user interruption is sent over PCIe from the FPGA.
IRQ #821 is called when reading data from FPGA (DDR)

I’m using a RT-PREEMPT patched kernel and I assagned IRQ thread’s to CPU core #3. (thanks to the “taskset” command).
I also set these thread’s priority to 80 (versus 50 by default for IRQ threads).
A last, I isolated the CPU core #3 adding isolcpus = 3 to the APPEND line in the file /boot/extlinux/extlinux.conf.

Thus, I’m expecting that cpu core #3 is only dedicated to handle these PCIe interrupts (user and read).
However, I’m experiencing some jitter but the most problematic to being very high latencies.
Indeed, if the average value is about 100us, the max observed value reach several milliseconds.
→ see the following latencies histogram which represents the transfer time of my data from FPGA to Jetson (about 20 kB every millisecond) :

Such latencies are unacceptable for my application and I must absolutely bound them.

Do you have any suggestions that could help bounding the transfer time ?

vidyas · December 8, 2020, 9:18am

Are you able to successfully get interrupts serviced only by CPU-3?

Bibek · December 8, 2020, 10:17am

can you try after seting nvpmodel -m 0 and running jetson_clocks so that cpus are at max clock?

nicolas.bon · December 8, 2020, 6:02pm

Yes, the interrupts associated are only serviced by CPU#3.

With “ps” command I’ve checked that only interrupts related to my PCIe application are serviced on CPU3 and that no other process is affected on this CPU.

nicolas.bon · December 8, 2020, 6:03pm

I forgot to told it but the tests are already run after setting nvpmodel -m 0 and running jetson_clocks script.

sumitg · December 10, 2020, 3:47pm

Please try the below steps and share if any improvements.

Disable cpuidle states by writing ‘1’ to ‘/sys/devices/system/cpu/cpu*/cpuidle/state*/disable’ sysfs node (or) disable cpu idle “CONFIG_CPU_IDLE” in kernel defconfig.
Pass “nohz=off” in kernel boot arg to disable the tickless kernel/dyntick-idle mode.
Change CONFIG_HZ_1000=y to have near realtime interrupt freq.

nicolas.bon · December 14, 2020, 1:39pm

Hi,

I’ve tried the things you mention.
My kernel was already built with the CONFIG_HZ_1000=y option. I although recompiled it disabling CONFIG_CPU_IDLE.

I also passed “nohz=off”.

Unfortunately, results are very similar, the average latency value is approximately the same and high latency occurencies are still observed.

I don’t understand how such latencies can occur since these IRQ and their management are restricted to CPU#3 which is isolated from the scheduler.

I’m wondering if the way I measure latencies could introduce such jitter.
For information, the latencies are simply measured in a thread (also executing on CPU#3) with the clock_gettime(CLOCK_MONOTONIC, …) function, located just before and just after the pread function (responsible from reading data over PCIe with DMA access).

Any idea ?
Help would be appreciated…

sumitg · December 15, 2020, 11:31am

Could you try below tests:

CLOCK_MONOTONIC → CLOCK_MONOTONIC_RAW
Set below node and check if any improvement observed.
“echo 0x8 > /sys/kernel/debug/tegra_mce/rt_safe_mask”
Try using ‘perf’ tool(Tutorial - Perf Wiki) to measure counters like ‘cpu-cycles’ instead of clock time.

If still no hints, then please share ftrace logs to check.

nicolas.bon · December 17, 2020, 2:56pm

The high latency values I observed are not linked to the way I measured them.

Indeed, I recompiled the Xilinx driver in “polling” mode so that a kernel thread check constantly if the transfer is done.
In the previous version the FPGA was sending an MSI-X interrupt at the end of a transfer.

The results are far better, indeed, with this “polling” mode, the latency values are contained within 165us.
(I modified the driver so that the kernel thread run only on cpu#3).

I will work with this option for now.

(However I still do not understand such latencies in “interupt” mode since IRQ management is set to CPU#3 which is isolated from the scheduler).

by the way, thanks for your advices.

Topic		Replies	Views
Jetson AGX Xavier IRQ over PCIe CPU affinity can't be changed Jetson AGX Xavier pcie	8	1836	October 18, 2021
PCI MSI interrupt generation issue in Jetson Jetson Xavier NX pcie , fpga	9	2151	May 30, 2022
IRQ Balancing Jetson AGX Xavier ethernet	17	4630	October 18, 2021
Default IRQ of 0 Jetson AGX Xavier pcie	4	966	October 18, 2021
Interrupt handling Jetson AGX Orin interrupt-moderation	5	34	December 25, 2024
What‘s the maximum speed supported by PCIE Ethernet for Jetson AGX Xavier？ Jetson AGX Xavier pcie	3	1050	May 25, 2022
The recommended data type and data transfer protocol to use in PCIe Gen4 interface for Automotive product Jetson AGX Xavier pcie , camera	7	631	October 10, 2023
Unexplained CPU latency spikes Jetson AGX Xavier	4	889	October 16, 2019
PCIe Bus Error Jetson Xavier NX pcie	9	2280	July 4, 2022
TK1 kernel oops : Terga4linux 21.5 : Freeze under high IRQ load / IRQ affinity Jetson TK1	15	1952	May 2, 2017

PCIe IRQ latency unbounded value

Related topics