I have a problem with 25G NICs. Sometimes I encounter a TX Timeout issue . We can’t access the NFS server. Who can help me?
Nov 7 06:31:02 TD06-L-R04-13U-SVR kernel: [1122824.589754] ------------[ cut here ]------------
Nov 7 06:31:02 TD06-L-R04-13U-SVR kernel: [1122824.594978] NETDEV WATCHDOG: enp4s0f0np0 (mlx5_core): transmit queue 5 timed out
Nov 7 06:31:02 TD06-L-R04-13U-SVR kernel: [1122824.602987] WARNING: CPU: 102 PID: 0 at net/sched/sch_generic.c:466 dev_watchdog+0x2d0/0x2d8
Nov 7 06:31:02 TD06-L-R04-13U-SVR kernel: [1122824.611991] Modules linked in: tcp_diag udp_diag raw_diag inet_diag unix_diag fuse nfsv3 nfs_acl nfs lockd grace fscache xt_CHECKSUM ipt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat nf_nat_ipv6 iptable_mangle iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter tun bridge stp llc bonding rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) binfmt_misc rfkill sunrpc vfat fat aes_ce_blk crypto_simd cryptd aes_ce_cipher crct10dif_ce ghash_ce sha2_ce ses enclosure sha256_arm64 sha1_ce ipmi_si mlx5_ib(OE) ipmi_devintf ib_uverbs(OE) ipmi_msghandler ib_core(OE) sch_fq_codel ip_tables xdmavg(OE) mlx5_core(OE) mlxfw(OE) tls psample mlxdevm(OE) auxiliary(OE) 8250_pci megaraid_sas
Nov 7 06:31:02 TD06-L-R04-13U-SVR kernel: [1122824.683633] mlx_compat(OE) ast devlink dm_mirror dm_region_hash dm_log
Nov 7 06:31:02 TD06-L-R04-13U-SVR kernel: [1122824.690831] CPU: 102 PID: 0 Comm: swapper/102 Kdump: loaded Tainted: G OE 4.19.90-25.17.v2101.hy2.01.ky10.aarch64 #1
Nov 7 06:31:02 TD06-L-R04-13U-SVR kernel: [1122824.703119] Hardware name: Lenovo Lenovo/BY680, BIOS KL4.26.LV.S.010.221207.D.Test 12/07/22 16:15:12
Nov 7 06:31:02 TD06-L-R04-13U-SVR kernel: [1122824.712813] pstate: 80000005 (Nzcv daif -PAN -UAO)
Nov 7 06:31:02 TD06-L-R04-13U-SVR kernel: [1122824.718190] pc : dev_watchdog+0x2d0/0x2d8
Nov 7 06:31:02 TD06-L-R04-13U-SVR kernel: [1122824.722787] lr : dev_watchdog+0x2d0/0x2d8
Nov 7 06:31:02 TD06-L-R04-13U-SVR kernel: [1122824.727384] sp : ffff933b7ff40070
Nov 7 06:31:02 TD06-L-R04-13U-SVR kernel: [1122824.731289] x29: ffff933b7ff40070 x28: 0000000000000002
Nov 7 06:31:02 TD06-L-R04-13U-SVR kernel: [1122824.737185] x27: ffff28acc78a0018 x26: 00000000ffffffff
Nov 7 06:31:02 TD06-L-R04-13U-SVR kernel: [1122824.743080] x25: 00000000000001c0 x24: ffff90f786ed0480
Nov 7 06:31:02 TD06-L-R04-13U-SVR kernel: [1122824.748973] x23: ffff90f786ed045c x22: ffff90f78f153100
Nov 7 06:31:02 TD06-L-R04-13U-SVR kernel: [1122824.754866] x21: ffff28acc7ca1000 x20: ffff90f786ed0000
Nov 7 06:31:02 TD06-L-R04-13U-SVR kernel: [1122824.760762] x19: 0000000000000005 x18: 0000000000000001
Nov 7 06:31:02 TD06-L-R04-13U-SVR kernel: [1122824.766654] x17: 0000000000000000 x16: 0000000000000000
Nov 7 06:31:02 TD06-L-R04-13U-SVR kernel: [1122824.772548] x15: ffffffffffffffff x14: ffff28ad47efbf27
Nov 7 06:31:02 TD06-L-R04-13U-SVR kernel: [1122824.778441] x13: ffff28acc7efbf3c x12: ffff28acc7ccd000
Nov 7 06:31:02 TD06-L-R04-13U-SVR kernel: [1122824.784334] x11: 0000000005f5e0ff x10: ffff933b7ff3fd20
Nov 7 06:31:02 TD06-L-R04-13U-SVR kernel: [1122824.790229] x9 : 00000000ffffffd0 x8 : 7420352065756575
Nov 7 06:31:02 TD06-L-R04-13U-SVR kernel: [1122824.796123] x7 : 712074696d736e61 x6 : ffff933b7feb13a8
Nov 7 06:31:02 TD06-L-R04-13U-SVR kernel: [1122824.802015] x5 : ffff933b7feb13a8 x4 : 0000004000000000
Nov 7 06:31:02 TD06-L-R04-13U-SVR kernel: [1122824.807909] x3 : 0000000000000080 x2 : 0000000000000103
Nov 7 06:31:02 TD06-L-R04-13U-SVR kernel: [1122824.813802] x1 : 0000000000000102 x0 : 0000000000000044
Nov 7 06:31:02 TD06-L-R04-13U-SVR kernel: [1122824.819695] Call trace:
Nov 7 06:31:02 TD06-L-R04-13U-SVR kernel: [1122824.822740] dev_watchdog+0x2d0/0x2d8
Nov 7 06:31:02 TD06-L-R04-13U-SVR kernel: [1122824.826993] call_timer_fn+0x34/0x1e8
Nov 7 06:31:02 TD06-L-R04-13U-SVR kernel: [1122824.831244] expire_timers+0xc4/0x160
Nov 7 06:31:02 TD06-L-R04-13U-SVR kernel: [1122824.835494] run_timer_softirq+0xd4/0x190
Nov 7 06:31:02 TD06-L-R04-13U-SVR kernel: [1122824.840094] __do_softirq+0x138/0x3c4
Nov 7 06:31:02 TD06-L-R04-13U-SVR kernel: [1122824.844347] irq_exit+0x11c/0x138
Nov 7 06:31:02 TD06-L-R04-13U-SVR kernel: [1122824.848254] __handle_domain_irq+0x90/0x100
Nov 7 06:31:02 TD06-L-R04-13U-SVR kernel: [1122824.853024] gic_handle_irq+0x80/0x18c
Nov 7 06:31:02 TD06-L-R04-13U-SVR kernel: [1122824.857364] el1_irq+0xb8/0x14c
Nov 7 06:31:02 TD06-L-R04-13U-SVR kernel: [1122824.861098] arch_cpu_idle+0x30/0x240
Nov 7 06:31:02 TD06-L-R04-13U-SVR kernel: [1122824.865353] default_idle_call+0x24/0x50
Nov 7 06:31:02 TD06-L-R04-13U-SVR kernel: [1122824.869866] do_idle+0x1a4/0x268
Nov 7 06:31:02 TD06-L-R04-13U-SVR kernel: [1122824.873685] cpu_startup_entry+0x2c/0x58
Nov 7 06:31:02 TD06-L-R04-13U-SVR kernel: [1122824.878198] secondary_start_kernel+0x184/0x1d0
Nov 7 06:31:02 TD06-L-R04-13U-SVR kernel: [1122824.883313] —[ end trace 9e6d5b847e942721 ]—
Nov 7 06:31:02 TD06-L-R04-13U-SVR kernel: [1122824.888534] mlx5_core 0000:04:00.0 enp4s0f0np0: TX timeout detected
Nov 7 06:31:02 TD06-L-R04-13U-SVR kernel: [1122824.895419] mlx5_core 0000:04:00.0 enp4s0f0np0: TX timeout on queue: 5, SQ: 0x1222, CQ: 0x420, SQ Cons: 0xe58f SQ Prod: 0xe59f, usecs since last trans: 15630000
Nov 7 06:31:02 TD06-L-R04-13U-SVR kernel: [1122824.910345] mlx5_core 0000:04:00.0 enp4s0f0np0: EQ 0xc: Cons = 0x4271e14, irqn = 0xe7