Kernel crash running IPSec tunnel via Strongswan on Jetson AGX

Hi NV,

We are running Strongswan 5.6.2, the kernel (Jetpack 4.6) crashed when we did a iperf3 stress testing after created IPsec tunnel. Need your help.
Then we tested the Jetpack 4.6.2 and 5.0.1 with NV AGX Developkit, but have the same issue still.
We found there was a same issue but with no answer in this URL:

Kernel logs:
    May 15 01:17:36 host-4d3ae6 kernel: [ 4472.154964] BUG: scheduling while atomic: swapper/5/0/0x00000103
May 15 01:17:36 host-4d3ae6 kernel: [ 4472.155111] Modules linked in: veth authenc ipt_REJECT nf_reject_ipv4 xt_statistic xt_nat xt_tcpudp iptable_raw ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6_tables iptable_mangle xt_comment xt_mark ip_vs_rr ip_vs ip_set_hash_netportnet ip_set_hash_netnet ip_set_hash_net ip_set_hash_netiface ip_set_hash_mac ip_set_hash_ipport ip_set_hash_ipportip ip_set_hash_ipmark ip_set_bitmap_port ip_set_bitmap_ipmac ip_set_bitmap_ip ip_set_hash_netport ip_set_list_set ip_set_hash_ip ip_set xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat br_netfilter xfrm4_mode_tunnel scsi_transport_iscsi ipcomp deflate xt_CT nf_conntrack xfrm_ipcomp pcrypt xfrm6_mode_tunnel xfrm6_tunnel
May 15 01:17:36 host-4d3ae6 kernel: [ 4472.155221]  tunnel6 xfrm4_tunnel tunnel4 esp6 esp4 ah6 ah4 overlay userspace_alert bluedroid_pm nvgpu ip_tables x_tables
May 15 01:17:36 host-4d3ae6 kernel: [ 4472.155256] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.9.253-tegra #42
May 15 01:17:36 host-4d3ae6 kernel: [ 4472.155258] Hardware name: Jetson-AGX (DT)
May 15 01:17:36 host-4d3ae6 kernel: [ 4472.155261] Call trace:
May 15 01:17:36 host-4d3ae6 kernel: [ 4472.155273] [<ffffff800808ba40>] dump_backtrace+0x0/0x198
May 15 01:17:36 host-4d3ae6 kernel: [ 4472.155278] [<ffffff800808c004>] show_stack+0x24/0x30
May 15 01:17:36 host-4d3ae6 kernel: [ 4472.155285] [<ffffff8008f78d24>] dump_stack+0xa0/0xc4
May 15 01:17:36 host-4d3ae6 kernel: [ 4472.155291] [<ffffff80080e5744>] __schedule_bug+0x64/0x80
May 15 01:17:36 host-4d3ae6 kernel: [ 4472.155295] [<ffffff8008f7ffc4>] __schedule+0x67c/0x7d0
May 15 01:17:36 host-4d3ae6 kernel: [ 4472.155300] [<ffffff8008f80158>] schedule+0x40/0xa8
May 15 01:17:36 host-4d3ae6 kernel: [ 4472.155304] [<ffffff8008f80608>] schedule_preempt_disabled+0x28/0x40
May 15 01:17:36 host-4d3ae6 kernel: [ 4472.155309] [<ffffff80081144e0>] mutex_optimistic_spin+0x1a8/0x1f8
May 15 01:17:36 host-4d3ae6 kernel: [ 4472.155316] [<ffffff8008f81d00>] __mutex_lock_slowpath+0x38/0x190
May 15 01:17:36 host-4d3ae6 kernel: [ 4472.155321] [<ffffff8008f81ea4>] mutex_lock+0x4c/0x60
May 15 01:17:36 host-4d3ae6 kernel: [ 4472.155327] [<ffffff8008c02294>] tegra_se_aes_queue_req+0x2c/0x98
May 15 01:17:36 host-4d3ae6 kernel: [ 4472.155331] [<ffffff8008c02528>] tegra_se_aes_cbc_encrypt+0x38/0x48
May 15 01:17:36 host-4d3ae6 kernel: [ 4472.155338] [<ffffff800840f594>] skcipher_encrypt_ablkcipher+0x64/0x70
May 15 01:17:36 host-4d3ae6 kernel: [ 4472.155345] [<ffffff80012582dc>] crypto_authenc_extractkeys+0x2dc/0xb30 [authenc]
May 15 01:17:36 host-4d3ae6 kernel: [ 4472.155349] [<ffffff80084105b4>] echainiv_encrypt+0x14c/0x190
May 15 01:17:36 host-4d3ae6 kernel: [ 4472.155355] [<ffffff8001140c18>] esp_output+0x348/0x498 [esp4]
May 15 01:17:36 host-4d3ae6 kernel: [ 4472.155361] [<ffffff8008e801a0>] xfrm_output_resume+0x3a0/0x410
May 15 01:17:36 host-4d3ae6 kernel: [ 4472.155365] [<ffffff8008e80288>] xfrm_output+0x48/0x110
May 15 01:17:36 host-4d3ae6 kernel: [ 4472.155371] [<ffffff8008e73290>] xfrm4_output_finish+0x38/0x48
May 15 01:17:36 host-4d3ae6 kernel: [ 4472.155374] [<ffffff8008e730dc>] __xfrm4_output+0x4c/0x90
May 15 01:17:36 host-4d3ae6 kernel: [ 4472.155378] [<ffffff8008e732e0>] xfrm4_output+0x40/0xd0
May 15 01:17:36 host-4d3ae6 kernel: [ 4472.155384] [<ffffff8008e1ba48>] ip_local_out+0x58/0x68
May 15 01:17:36 host-4d3ae6 kernel: [ 4472.155389] [<ffffff8008e1bd34>] ip_queue_xmit+0x124/0x398
May 15 01:17:36 host-4d3ae6 kernel: [ 4472.155393] [<ffffff8008e37044>] __tcp_transmit_skb+0x59c/0x980
May 15 01:17:36 host-4d3ae6 kernel: [ 4472.155398] [<ffffff8008e38374>] tcp_write_xmit+0x304/0xf18
May 15 01:17:36 host-4d3ae6 kernel: [ 4472.155402] [<ffffff8008e38fe4>] __tcp_push_pending_frames+0x5c/0x100
May 15 01:17:36 host-4d3ae6 kernel: [ 4472.155406] [<ffffff8008e3371c>] tcp_rcv_established+0x57c/0x7c0
May 15 01:17:36 host-4d3ae6 kernel: [ 4472.155410] [<ffffff8008e3d208>] tcp_v4_do_rcv+0x108/0x248
May 15 01:17:36 host-4d3ae6 kernel: [ 4472.155413] [<ffffff8008e3fea4>] tcp_v4_rcv+0xaac/0xc00
May 15 01:17:36 host-4d3ae6 kernel: [ 4472.155417] [<ffffff8008e16378>] ip_local_deliver_finish+0x80/0x280
May 15 01:17:36 host-4d3ae6 kernel: [ 4472.155421] [<ffffff8008e16aac>] ip_local_deliver+0x54/0xf0
May 15 01:17:36 host-4d3ae6 kernel: [ 4472.155425] [<ffffff8008e16770>] ip_rcv_finish+0x1f8/0x380
May 15 01:17:36 host-4d3ae6 kernel: [ 4472.155429] [<ffffff8008e16dc8>] ip_rcv+0x280/0x388
May 15 01:17:36 host-4d3ae6 kernel: [ 4472.155436] [<ffffff8008dc49d8>] __netif_receive_skb_core+0x3b8/0xad8
May 15 01:17:36 host-4d3ae6 kernel: [ 4472.155440] [<ffffff8008dc7ee8>] __netif_receive_skb+0x28/0x78
May 15 01:17:36 host-4d3ae6 kernel: [ 4472.155444] [<ffffff8008dca4dc>] process_backlog+0x94/0x140
May 15 01:17:36 host-4d3ae6 kernel: [ 4472.155447] [<ffffff8008dca1e4>] net_rx_action+0xf4/0x358
May 15 01:17:36 host-4d3ae6 kernel: [ 4472.155451] [<ffffff8008081054>] __do_softirq+0x13c/0x3b0
May 15 01:17:36 host-4d3ae6 kernel: [ 4472.155457] [<ffffff80080ba090>] irq_exit+0xd0/0x118
May 15 01:17:36 host-4d3ae6 kernel: [ 4472.155462] [<ffffff8008120fe4>] __handle_domain_irq+0x6c/0xc0
May 15 01:17:36 host-4d3ae6 kernel: [ 4472.155465] [<ffffff8008080d44>] gic_handle_irq+0x5c/0xb0
May 15 01:17:36 host-4d3ae6 kernel: [ 4472.155469] [<ffffff8008082c28>] el1_irq+0xe8/0x194
May 15 01:17:36 host-4d3ae6 kernel: [ 4472.155474] [<ffffff8008bb6e48>] cpuidle_enter_state+0xb8/0x380
May 15 01:17:36 host-4d3ae6 kernel: [ 4472.155478] [<ffffff8008bb7184>] cpuidle_enter+0x34/0x48
May 15 01:17:36 host-4d3ae6 kernel: [ 4472.155482] [<ffffff800811139c>] call_cpuidle+0x44/0x70
May 15 01:17:36 host-4d3ae6 kernel: [ 4472.155485] [<ffffff8008111718>] cpu_startup_entry+0x1b0/0x200
May 15 01:17:36 host-4d3ae6 kernel: [ 4472.155490] [<ffffff8008091cf8>] secondary_start_kernel+0x190/0x1f8
May 15 01:17:36 host-4d3ae6 kernel: [ 4472.155493] [<0000000080f861a8>] 0x80f861a8
May 15 01:17:36 host-4d3ae6 kernel: [ 4472.155787] softirq: huh, entered softirq 3 NET_RX ffffff8008dca0f0 with preempt_count 00000101, exited with ffffffff?
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000] Booting Linux on physical CPU 0x0
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000] Linux version 4.9.253-tegra (ydc@ydc-pc) (gcc version 7.3.1 20180425 [linaro-7.3-2018.05 revision d29120a424ecfbc167ef90065c0eeb7f91977701] (Linaro GCC 7.3-2018.05) ) #42 SMP PREEMPT Fri Jan 7 10:53:18 CST 2022
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000] Boot CPU: AArch64 Processor [4e0f0040]
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000] OF: fdt:memory scan node memory, reg size 48,
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000] OF: fdt: - 80000000 ,  2c000000
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000] OF: fdt: - ac200000 ,  44800000
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000] OF: fdt: - 100000000 ,  780000000
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000] earlycon: tegra_comb_uart0 at MMIO32 0x000000000c168000 (options '')
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000] bootconsole [tegra_comb_uart0] enabled
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000] OF: reserved mem: initialized node ramoops_carveout, compatible id nvidia,ramoops
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000] OF: reserved mem: initialized node vpr-carveout, compatible id nvidia,vpr-carveout
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000] OF: reserved mem: initialized node grid-of-semaphores, compatible id nvidia,gosmem
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000] cma: Reserved 64 MiB at 0x00000000c2000000
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000] On node 0 totalpages: 8324608
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000]   DMA zone: 7192 pages used for memmap
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000]   DMA zone: 0 pages reserved
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000]   DMA zone: 460288 pages, LIFO batch:31
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000]   Normal zone: 122880 pages used for memmap
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000]   Normal zone: 7864320 pages, LIFO batch:31
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000] psci: probing for conduit method from DT.
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000] psci: PSCIv1.0 detected in firmware.
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000] psci: Using standard PSCI v0.2 function IDs
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000] psci: MIGRATE_INFO_TYPE not supported.
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000] psci: SMC Calling Convention v1.1
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000] percpu: Embedded 24 pages/cpu s57624 r8192 d32488 u98304
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000] pcpu-alloc: s57624 r8192 d32488 u98304 alloc=24*4096
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 [0] 4 [0] 5 [0] 6 [0] 7 
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000] Speculative Store Bypass Disable mitigation not required
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 8194536
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000] Kernel command line: console=ttyTCU0,115200 video=tegrafb earlycon=tegra_comb_uart,mmio32,0x0c168000 gpt rootfs.slot_suffix= usbcore.old_scheme_first=1 tegraid=19.1.2.0.0 maxcpus=8 boot.slot_suffix= boot.ratchetvalues=0.4.2 vpr_resize sdhci_tegra.en_boot_part_access=1    quiet root=/dev/nvme0n1p1 rw rootwait rootfstype=ext4 console=ttyTCU0,115200n8 console=tty0 fbcon=map:0 net.ifnames=0 rootfstype=ext4 apparmor=0
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000] log_buf_len individual max cpu contribution: 32768 bytes
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000] log_buf_len total cpu_extra contributions: 229376 bytes
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000] log_buf_len min size: 32768 bytes
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000] log_buf_len: 262144 bytes
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000] early log buf free: 30108(91%)
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000] Dentry cache hash table entries: 4194304 (order: 13, 33554432 bytes)
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000] Inode-cache hash table entries: 2097152 (order: 12, 16777216 bytes)
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000] Memory: 31925228K/33298432K available (15422K kernel code, 2966K rwdata, 6784K rodata, 8704K init, 610K bss, 619540K reserved, 753664K cma-reserved)
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000] Virtual kernel memory layout:
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000]     modules : 0xffffff8000000000 - 0xffffff8008000000   (   128 MB)
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000]     vmalloc : 0xffffff8008000000 - 0xffffffbebfff0000   (   250 GB)
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000]       .text : 0xffffff8008080000 - 0xffffff8008f90000   ( 15424 KB)
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000]     .rodata : 0xffffff8008f90000 - 0xffffff8009640000   (  6848 KB)
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000]       .init : 0xffffff8009640000 - 0xffffff8009ec0000   (  8704 KB)
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000]       .data : 0xffffff8009ec0000 - 0xffffff800a1a5808   (  2967 KB)
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000]        .bss : 0xffffff800a1a5808 - 0xffffff800a23e2f4   (   611 KB)
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000]     fixed   : 0xffffffbefe7fd000 - 0xffffffbefec00000   (  4108 KB)
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000]     PCI I/O : 0xffffffbefee00000 - 0xffffffbeffe00000   (    16 MB)
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000]     vmemmap : 0xffffffbf00000000 - 0xffffffc000000000   (     4 GB maximum)
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000]               0xffffffbf00000000 - 0xffffffbf20000000   (   512 MB actual)
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000]     memory  : 0xffffffc000000000 - 0xffffffc800000000   ( 32768 MB)
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=8, Nodes=1
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000] Preemptible hierarchical RCU implementation.
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000] 	Build-time adjustment of leaf fanout to 64.
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000] 	RCU restricting CPUs from NR_CPUS=64 to nr_cpu_ids=8.
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=64, nr_cpu_ids=8
May 15 01:22:01 host-4d3ae6 kernel: [    0.000000] NR_IRQS:64 nr_irqs:64 0

Test env and reproduce steps:
Prepares: an Ubuntu 1804 or 2004 PC + NV AGX Developkit

I cann’t find the method to upload my scripts and config files, how can I send it to NV ?

  1. install strongswan 5.6.2
    Source code URL: https://download.strongswan.org/strongswan-5.6.2.tar.bz2
    1.1 install dependcy
    apt install libgmp-dev
    1.2 compile strongswan
    ./configure
    make
    sudo make install

  2. run strongswan
    systemctl restart strongswan
    or
    /usr/local/sbin/ipsec start --nofork

  3. copy config file to swanctl dir
    pc:
    cp swanctl_pc.conf /usr/local/etc/swanctl/conf.d/swanctl.conf

    AGX:
    cp swanctl_box.conf /usr/local/etc/swanctl/conf.d/swanctl.conf

  4. create virtual network in AGX and PC
    pc (create IP: 2.2.4.2):
    ./snet.sh 4
    or clear
    ./snet.sh 4 clear

    AGX(create IP: 2.2.3.2):
    ./snet.sh 3
    or clear
    ./snet.sh 3 clear

  5. run swanctl
    PC:
    swanctl --load-all
    ipsec up host-host

     AGX:
     swanctl --load-all|
     ipsec up host-host|
    
  6. ping AGX from PC
    ip netns exec ns_edge0 ping 2.2.3.1
    or ping directly
    ping 2.2.3.1

  7. run iperf3
    AGX:
    iperf3 -s

     PC:
     iperf3 -c 2.2.3.1 -t 0 -i 1
    
  8. Then the AGX will crash immediately

Hi,
We don’t have experience on about using the tool. Would need other users to check and suggest next.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.