Very high core to core CAS latency

gen2 · September 6, 2024, 8:10am

As we are running the core-to-core latency benchmark, the Jetson Xavier exhibits very high core-to-core latency (~1000ns)

core-to-core-latency/src/bench/cas.rs at main · nviennot/core-to-core-latency · GitHub is the testing code, where basically two threads spinning Compare and Swap (CAS) untill success in loop and measure the time

This is running on nvidia’s devkit with vanilla jetpack, with MAXN power settings, and we tried on different versions of Jetpack (35.5.0, 35.4.1, 35.2.1), also on different jetson xavier modules as well but they all exhibit similar results as shown below

However, we don’t see this bad results on Jetson Orin

Any idea why this could happen?

DaneLLL · September 6, 2024, 10:04am

Hi,
Do you compare Xavier and Orin in same Jetpack 5.1.3? If software version is identical, the deviation may be from the hardware capability.

gen2 · September 6, 2024, 10:26am

For Orin we only tried 5.1.2, as it gives good results we didn’t try other versions, for Xavier we have tried 5.1.2, 5.1.3 and 5.1, where they all give bad results

1000ns+ for hardware capability still sounds a very huge number to me though, so I am suspecting this could be some software or kernel related
based on the results here GitHub - nviennot/core-to-core-latency: Measures the latency between CPU cores even the CPU from 2003 (IBM PowerPC 970, 1.8GHz, 2 Cores, 2003-Q2) is 600ns

linuxdev · September 6, 2024, 6:11pm

I am curious, are all cores doing this simultaneously? Or are you testing a pair of cores at any given time, and then moving on to other core pairs?

gen2 · September 6, 2024, 6:21pm

This is testing pair by pair, e.g. testing between core 1 to 2, 1 to 3, … 1 to 8 and then 2 to 1, 2 to 3 etc.

linuxdev · September 6, 2024, 6:41pm

Do beware that the first CPU (I call it CPU0, but in your chart it is CPU 1) is used for hardware interrupts. The other cores are using soft IRQs. Any case which loads down the first core has a chance of slowing all of the other cores. In this case I don’t think it is an issue because you are not accessing hardware drivers (the lock spin doesn’t need the disk, it doesn’t need ethernet, so on). However, if for some reason not presented here the first core is under load, then it could have an effect on your test due to making hardware drivers wait also.

This isn’t exactly what you are asking, and isn’t really part of your test, but you might want to quickly examine “/proc/interrupts”. This is a list of hardware IRQs. You’ll notice that almost everything requires CPU0. Each core has timers, but it will be possible at times in an incorrect test setup to have another core depend on a hardware driver from CPU0 (I don’t think your test would have this issue because you only work a pair of cores at a time).

gen2 · September 10, 2024, 7:33pm

I also just tried to use the rt kernel on Xavier, which improves the latency significantly, this makes me think it is something related to the kernel that causes the high delay in the regular kernel

linuxdev · September 11, 2024, 7:35pm

It is likely related to the kernel. I doubt the hardware design is that different between any of the Orin models to have such an effect on one model, but not on the other. Understand though that the RT kernel does not magically reduce all latency. There is also software configuration of cgroups that determines how the RT scheduling modifies operation timing. I’m the wrong guy to do it, but if you can see what kind of configuration differences there are between slow versus fast latencies, it might offer a clue. For example, the file “/proc/cgroups” might offer clues if you can find differences between the operation of two kernels and go through each line item one at a time and look up what configuration might change that line item.

system · October 9, 2024, 5:44am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Unexplained CPU latency spikes Jetson AGX Xavier	4	935	October 16, 2019
Observing Latency Discrepancies on NVIDIA Jetson Nano with PREEMPT_RT Jetson Orin Nano preempt_rt	7	99	April 18, 2025
CPU performance is worse on the Xavier then the TX2 Jetson AGX Xavier	9	2268	October 18, 2021
Part of Code executing on CPU of Jetson Agx Orion board is taking more time compared to execution time of same code on Jetson Agx xavier board CUDA Programming and Performance	6	175	May 30, 2024
Interrupt Latency of Jetson Xavier NX/AGX? Jetson Xavier NX kernel	3	681	September 12, 2021
Anyone noticing a slower UI compared to the TX2? (Solved) Jetson AGX Xavier	13	861	October 18, 2021
Kernel Level timer Interrupt Latency Test (Cyclic-Test) Jetson Xavier NX kernel	7	1794	October 18, 2021
Network Packet Loss for Xavier NX vs Orin NX Jetson Orin NX nvbugs , networking	11	2030	June 27, 2023
Jetson TX1 poor performance relative to TX2 Jetson TX1	6	727	October 18, 2021
Ethernet Port - Strange/Unusual latency spikes pattern Jetson Orin NX ethernet	19	271	October 22, 2024

Very high core to core CAS latency

Related topics