Kernel Level timer Interrupt Latency Test (Cyclic-Test)

  1. I disabled the GUI interface to avoid interference and ran a kernel-level cyclic-test (To find out timer interrupt latency) on the bare-metal.

  2. The average latency value is around 8.5 us which is very high for a bare-metal. I even ran the user-level cyclic-test to cross-verify mine, it gave me the same result too.

I am not sure why the latency is so high. I would really appreciate it if I have any insights or comments from, you guys. Thank you so much in advance!

Hi Reswara,

  • You are using RT-Linux or Non-RT kernel?

  • RT patches can be applied by running the below script:

    kernel/kernel-4.9$ ./scripts/rt-patch.sh apply-patches

  • Could you share the command used for test along with output.

  • If you are using ‘/home/ubuntu/cyclictest’ bin directly and not the script ‘/home/username/cyclictest.sh’ then please run ‘/home/username/jetson_clocks.sh’ before the test.

Hello sumitg,
Thank you for your help.

  • I was using the Non-RT kernel. As per your suggestion, I applied the patch and ran jetson_clock, the latency values are pretty decent in the host.
  • But the problem is, I couldn’t able to boot my VMs as the dmesg says there is a kernel bug

[ 1805.647429] ------------[ cut here ]------------
[ 1805.647436] kernel BUG at arch/arm64/kvm/…/…/…/arch/arm/kvm/arm.c:83!
[ 1805.647444] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[ 1805.647501] Modules linked in: vhost_net vhost macvtap macvlan xt_CHECKSUM iptable_mangle zram ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_filter ip6_tables iptable_filter overlay cfg80211 bnep binfmt_misc btusb spidev btbcm btintel btrtl nvgpu bluedroid_pm ip_tables x_tables
[ 1805.647515] CPU: 0 PID: 9019 Comm: qemu-system-aar Tainted: G W 4.9.140-rt93-tegra-virt #2
[ 1805.647517] Hardware name: NVIDIA Jetson Xavier NX Developer Kit (DT)
[ 1805.647522] task: ffffffc1e10d6900 task.stack: ffffffc1b0e44000
[ 1805.647535] PC is at kvm_arm_get_running_vcpu+0x3c/0x40
[ 1805.647541] LR is at vgic_mmio_write_cactive+0x8c/0x178
[ 1805.647546] pc : [] lr : [] pstate: 40400149
[ 1805.647547] sp : ffffffc1b0e47b80
[ 1805.647584] x29: ffffffc1b0e47b80 x28: ffffffc1e10d6900
[ 1805.647590] x27: ffffffc1b0cd8000 x26: 0000000000000020
[ 1805.647595] x25: ffffffc1b0e47be8 x24: 0000000000000020
[ 1805.647602] x23: 0000000000000000 x22: ffffffc1b0cd8000
[ 1805.647607] x21: 0000000000000000 x20: ffffff8008faf798
[ 1805.647612] x19: ffffffc1b0cd8f68 x18: 0000007fce8f5f46
[ 1805.647617] x17: 0000007fb4c40b00 x16: ffffff8008297710
[ 1805.647622] x15: 0000000000000010 x14: 00000000000000eb
[ 1805.647626] x13: 0000000000000000 x12: 00000000000000f0
[ 1805.647631] x11: 0000000000000002 x10: 00000000000003ff
[ 1805.647637] x9 : 0000000000000100 x8 : 0000000000000380
[ 1805.647657] x7 : 0000000100000000 x6 : ffffffc1b0e47c90
[ 1805.647663] x5 : 00000000ffffffff x4 : 0000000000000000
[ 1805.647667] x3 : 0000000000000000 x2 : ffffffc1e10d6900
[ 1805.647672] x1 : 0000000000000000 x0 : 0000000000000140

[ 1805.647677] Process qemu-system-aar (pid: 9019, stack limit = 0xffffffc1b0e44000)
[ 1805.647679] Call trace:
[ 1805.647687] [] kvm_arm_get_running_vcpu+0x3c/0x40
[ 1805.647692] [] vgic_mmio_write_cactive+0x8c/0x178
[ 1805.647696] [] dispatch_mmio_write+0x148/0x168
[ 1805.647700] [] vgic_uaccess+0xb4/0xc0
[ 1805.647705] [] vgic_v2_dist_uaccess+0x60/0x70
[ 1805.647710] [] vgic_attr_regs_access_v2+0x1d0/0x1f8
[ 1805.647733] [] vgic_v2_set_attr+0xcc/0x148
[ 1805.647737] [] kvm_device_ioctl_attr+0x80/0x98
[ 1805.647741] [] kvm_device_ioctl+0x8c/0xe8
[ 1805.647749] [] do_vfs_ioctl+0xb0/0x8d8
[ 1805.647753] [] SyS_ioctl+0x8c/0xa8
[ 1805.647759] [] el0_svc_naked+0x34/0x38
[ 1805.813152] —[ end trace 0000000000000003 ]—
[ 1807.518943] virbr0: port 2(vnet0) entered forwarding state
[ 1807.519179] virbr0: topology change detected, propagating
rose@rose-desktop:~$

Hello Sumitg,
These are the user-level cyclictest values

With RT Patch & /usr/bin/jetson clock

sudo ./cyclictest --smp -p95 -m -l100000

/dev/cpu_dma_latency set to 0us

policy: fifo: loadavg: 0.32 0.41 0.24 1/416 7715

T: 0 ( 7620) P:95 I:1000 C: 100000 Min: 8 Act: 27 Avg: 17 Max: 111
T: 1 ( 7621) P:95 I:1500 C: 66666 Min: 11 Act: 17 Avg: 17 Max: 103
T: 2 ( 7622) P:95 I:2000 C: 49995 Min: 11 Act: 16 Avg: 17 Max: 106
T: 3 ( 7623) P:95 I:2500 C: 39993 Min: 11 Act: 16 Avg: 18 Max: 61
T: 4 ( 7624) P:95 I:3000 C: 33325 Min: 11 Act: 17 Avg: 18 Max: 85
T: 5 ( 7625) P:95 I:3500 C: 28563 Min: 11 Act: 17 Avg: 17 Max: 60

We ran our own kernel-level cyclic-test and got 5.45 us on the bare-metal.

The values from the user-level cyclic-test are so high. Please let me know if I have to do any optimizations.

Hello reswara1,

May I know where I can download the kernel-level cyclictest from? We run the user-level cyclictest during our QA to make sure we are seeing acceptable latencies. However, it will be interesting to see the latency with the kernel-level community-accepted application.

Concurrent Real-Time develops a real-time OS for all the jetson boards, and guarantee a user-level cyclictest’s latency (Max:) below 50us. The RTOS is called RedHawk: RedHawk on AGX Xavier. A new version of RedHawk on Xavier NX will be released soon.

Couple of options that you can try in order to get better latency:

  1. Boot the kernel with isolcpus boot parameter, and run the cyclictest pinned to the isocpus core.
 sudo ./cyclictest -a 3 -m -p95
  1. Please set the nvpmodel to 0, and later run “jetson_clocks”.
 suod nvpmodel -m 0 
  1. Don’t run the cyclictest(1) with --smp option as that will run on all the available cores, instead run it only on one dedicated core.

Hope this helps with the numbers, though if you are looking for < 50 us latency then RedHawk RTOS can help you achieve that.

Thank you so much for your help.

Thank you :-) Appreciate it!