kyoon
August 1, 2023, 10:37am
1
A kernel crash occurred after installing the mlnx_ofed 23.04-1.1.3.0 driver.
The system log is as follows.
Aug 1 01:51:16 Qacloudhost06 kernel: [ 1105.061289] mlx5_core 0000:5e:00.1 enp94s0f1np1: Dropping C-tag vlan stripping offload due to S-tag vlan
Aug 1 01:51:16 Qacloudhost06 kernel: [ 1105.062171] mlx5_core 0000:5e:00.1 enp94s0f1np1: Disabling HW_VLAN CTAG FILTERING, not supported in switchdev mode
Hi kyoon
Regarding the log “Dropping C-tag vlan stripping offload due to S-tag vlan”
C-Tag feature enabled by default. So when VM or other application disable it. so when the end application can not read the packets, it displays the log messages. i think ethoool can change the settings.
Regarding the meaning of “Disabling HW_VLAN CTAG FILTERING, not supported in switchdev mode”, could you please open the CASE?
Regarding the kernel crash issue, you must open a ticket as well.
/HyungKwang
kyoon
August 2, 2023, 1:58am
3
Hi hyungkwang
How do I open a case?
This configuration worked fine when using 5.8-2.0.3.
The eswitch configuration is registered as a systemd service and is done at boot time.
Passes additional logs of conflicts between the driver and the kernel.
The kernel version is “Linux 5.15.0-60-generic”.
Aug 1 01:35:33 Qacloudhost06 kernel: [ 161.780927] Bridge firewalling registered
Aug 1 01:35:33 Qacloudhost06 kernel: [ 161.881923] Initializing XFRM netlink socket
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.413237] BUG: unable to handle page fault for address: 0000000000080948
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.413287] #PF: supervisor read access in kernel mode
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.413315] #PF: error_code(0x0000) - not-present page
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.413341] PGD 1d66c7067 P4D 1d66c7067 PUD 1d66c6067 PMD 0
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.413374] Oops: 0000 [#1] SMP NOPTI
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.413395] CPU: 1 PID: 6593 Comm: node-exporter Tainted: G OE 5.15.0-60-generic #66-Ubuntu
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.413442] Hardware name: Dell Inc. PowerEdge R640/0H28RR, BIOS 2.13.3 12/13/2021
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.413480] RIP: 0010:mlx5e_hairpin_oob_cnt_get+0x1f/0x70 [mlx5_core]
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.413621] Code: 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 48 8b 87 10 09 00 00 48 89 f3 48 8b 80 e0 00 00 00 <4c> 8b a0 48 09 08 00 4d 85 e4 74 28 4d 8d 6c 24 30 4c 89 ef e8 e8
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.413710] RSP: 0018:ffffaeb9ce8bfc40 EFLAGS: 00010246
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.413737] RAX: 0000000000000000 RBX: ffffaeb9ce8bfc68 RCX: 0000000000000000
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.413772] RDX: ffffa0659248c000 RSI: ffffaeb9ce8bfc68 RDI: ffffa064e8080980
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.413806] RBP: ffffaeb9ce8bfc58 R08: ffffa064e8080560 R09: ffffa07448837000
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.413843] R10: 000000000000000b R11: 0000000000000000 R12: ffffa064e8080a00
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.413878] R13: ffffa07448837000 R14: ffffa064e8080980 R15: ffffa074487ff0f0
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.413913] FS: 000000c000544090(0000) GS:ffffa083ff600000(0000) knlGS:0000000000000000
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.413952] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.413984] CR2: 0000000000080948 CR3: 00000010b5a8c005 CR4: 00000000007706e0
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.414020] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.414056] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.414090] PKRU: 55555554
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.414106] Call Trace:
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.414122] <TASK>
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.414138] hp_oob_cnt_show+0x48/0x90 [mlx5_core]
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.414260] dev_attr_show+0x1a/0x50
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.414283] sysfs_kf_seq_show+0xa2/0x100
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.414308] kernfs_seq_show+0x24/0x30
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.414331] seq_read_iter+0x121/0x4b0
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.414356] kernfs_fop_read_iter+0x30/0x40
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.414380] new_sync_read+0x10a/0x190
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.414404] vfs_read+0x103/0x1a0
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.414423] ksys_read+0x67/0xf0
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.414442] __x64_sys_read+0x19/0x20
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.414462] do_syscall_64+0x59/0xc0
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.414484] ? exit_to_user_mode_prepare+0x37/0xb0
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.414504] ? syscall_exit_to_user_mode+0x27/0x50
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.414522] ? do_syscall_64+0x69/0xc0
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.414535] ? syscall_exit_to_user_mode+0x27/0x50
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.414551] ? do_syscall_64+0x69/0xc0
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.414564] ? do_syscall_64+0x69/0xc0
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.414581] ? do_syscall_64+0x69/0xc0
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.415536] ? do_syscall_64+0x69/0xc0
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.416414] ? do_syscall_64+0x69/0xc0
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.417235] entry_SYSCALL_64_after_hwframe+0x61/0xcb
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.417988] RIP: 0033:0x4bb57b
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.418535] Code: fb ff eb bd e8 66 21 fb ff e9 61 ff ff ff cc e8 db ef fa ff 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 20 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.419656] RSP: 002b:000000c000651400 EFLAGS: 00000202 ORIG_RAX: 0000000000000000
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.420218] RAX: ffffffffffffffda RBX: 000000c000048000 RCX: 00000000004bb57b
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.420777] RDX: 0000000000000080 RSI: 000000c0006514a0 RDI: 0000000000000008
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.421329] RBP: 000000c000651450 R08: 0000000000b64a01 R09: fffffffffffffff8
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.421882] R10: 00007fcf6e293888 R11: 0000000000000202 R12: 00000000000000f2
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.422529] R13: 0000000000000000 R14: 0000000000cf88e6 R15: 0000000000000000
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.423197] </TASK>
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.423881] Modules linked in: nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype br_netfilter xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp nft_compat nft_chain_nat nft_counter nf_tables bridge cuse overlay act_mirred act_skbedit geneve ip6_udp_tunnel udp_tunnel nfnetlink_cttimeout nfnetlink act_gact cls_flower sch_ingress openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 8021q garp mrp stp llc bonding intel_rapl_msr intel_rapl_common isst_if_common skx_edac ipmi_ssif nfit x86_pkg_temp_thermal intel_powerclamp coretemp rapl intel_cstate dell_smbios dcdbas dell_wmi_descriptor wmi_bmof intel_pch_thermal mei_me mei acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter mac_hid rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) binfmt_misc kvm_intel kvm sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua knem(OE) ramoops pstore_blk reed_solomon
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.423935] efi_pstore msr pstore_zone ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear mgag200 crct10dif_pclmul mlx5_core(OE) crc32_pclmul i2c_algo_bit ghash_clmulni_intel drm_kms_helper aesni_intel mlxdevm(OE) mlxfw(OE) psample crypto_simd syscopyarea tls cryptd sysfillrect sysimgblt mlx_compat(OE) fb_sys_fops pci_hyperv_intf megaraid_sas cec rc_core i2c_i801 bnxt_en drm ahci tg3 i2c_smbus xhci_pci lpc_ich libahci xhci_pci_renesas wmi
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.434692] CR2: 0000000000080948
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.435642] ---[ end trace f1d8d819a6142ee7 ]---
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.483495] RIP: 0010:mlx5e_hairpin_oob_cnt_get+0x1f/0x70 [mlx5_core]
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.484573] Code: 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 48 8b 87 10 09 00 00 48 89 f3 48 8b 80 e0 00 00 00 <4c> 8b a0 48 09 08 00 4d 85 e4 74 28 4d 8d 6c 24 30 4c 89 ef e8 e8
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.486610] RSP: 0018:ffffaeb9ce8bfc40 EFLAGS: 00010246
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.487645] RAX: 0000000000000000 RBX: ffffaeb9ce8bfc68 RCX: 0000000000000000
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.488687] RDX: ffffa0659248c000 RSI: ffffaeb9ce8bfc68 RDI: ffffa064e8080980
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.489740] RBP: ffffaeb9ce8bfc58 R08: ffffa064e8080560 R09: ffffa07448837000
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.490791] R10: 000000000000000b R11: 0000000000000000 R12: ffffa064e8080a00
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.491845] R13: ffffa07448837000 R14: ffffa064e8080980 R15: ffffa074487ff0f0
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.492905] FS: 000000c000544090(0000) GS:ffffa083ff600000(0000) knlGS:0000000000000000
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.493995] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.495088] CR2: 0000000000080948 CR3: 00000010b5a8c005 CR4: 00000000007706e0
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.496199] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.497308] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Aug 1 01:35:37 Qacloudhost06 kernel: [ 165.498409] PKRU: 55555554
Aug 1 01:35:52 Qacloudhost06 kernel: [ 181.099872] capability: warning: `privsep-helper' uses deprecated v2 capabilities in a way that may be insecure
Aug 1 01:51:13 Qacloudhost06 kernel: [ 1101.617896] mlx5_core 0000:5e:00.1: E-Switch: Disable: mode(LEGACY), nvfs(0), active vports(0)
Aug 1 01:51:15 Qacloudhost06 kernel: [ 1103.940109] mlx5_core 0000:5e:00.1: E-Switch: Supported tc chains and prios offload
Aug 1 01:51:15 Qacloudhost06 kernel: [ 1103.941926] mlx5_core 0000:5e:00.1: Supported tc offload range - chains: 4294967294, prios: 4294967295
Aug 1 01:51:16 Qacloudhost06 kernel: [ 1104.826419] mlx5_core 0000:5e:00.1: MLX5E: StrdRq(1) RqSz(64) StrdSz(256) RxCqeCmprss(0)
Aug 1 01:51:16 Qacloudhost06 kernel: [ 1105.059092] mlx5_core 0000:5e:00.1 enp94s0f1np1: Link up
Aug 1 01:51:16 Qacloudhost06 kernel: [ 1105.061289] mlx5_core 0000:5e:00.1 enp94s0f1np1: Dropping C-tag vlan stripping offload due to S-tag vlan
Aug 1 01:51:16 Qacloudhost06 kernel: [ 1105.062171] mlx5_core 0000:5e:00.1 enp94s0f1np1: Disabling HW_VLAN CTAG FILTERING, not supported in switchdev mode
Aug 1 01:51:16 Qacloudhost06 kernel: [ 1105.119095] mlx5_core 0000:5e:00.1: E-Switch: Enable: mode(OFFLOADS), nvfs(0), active vports(1)
thanks
kyoon:
How do I open a case?
To open a CASE, please contact your NVIDIA regional sales representive or Nvidia partner, and get an valid license to get technical support per your purchased product, and then ask partner or sales about “how to open a technical case?”
1 Like
system
Closed
August 16, 2023, 2:15am
5
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.