Hi, we have a centos 7.6 running, with kernel 3.10.0-957.10.1.el7.x86_64 and mellanox drivers MLNX_OFED_LINUX-4.5-1.0.1.0-rhel7.6-ext installed with kernel support.
Sometimes ib0 stops working with dmesg like:
[Mon Jun 24 20:07:01 2019] Hardware name: Supermicro X10DRi/X10DRi, BIOS 1.0c 12/30/2014
[Mon Jun 24 20:07:01 2019] Call Trace:
[Mon Jun 24 20:07:01 2019] [] dump_stack+0x19/0x1b
[Mon Jun 24 20:07:01 2019] [] __warn+0xd8/0x100
[Mon Jun 24 20:07:01 2019] [] warn_slowpath_fmt+0x5f/0x80
[Mon Jun 24 20:07:01 2019] [] dev_watchdog+0x248/0x260
[Mon Jun 24 20:07:01 2019] [] ? dev_deactivate_queue.constprop.26+0x60/0x60
[Mon Jun 24 20:07:01 2019] [] call_timer_fn+0x38/0x110
[Mon Jun 24 20:07:01 2019] [] ? dev_deactivate_queue.constprop.26+0x60/0x60
[Mon Jun 24 20:07:01 2019] [] run_timer_softirq+0x24d/0x300
[Mon Jun 24 20:07:01 2019] [] __do_softirq+0xf5/0x280
[Mon Jun 24 20:07:01 2019] [] call_softirq+0x1c/0x30
[Mon Jun 24 20:07:01 2019] [] do_softirq+0x65/0xa0
[Mon Jun 24 20:07:01 2019] [] irq_exit+0x105/0x110
[Mon Jun 24 20:07:01 2019] [] smp_apic_timer_interrupt+0x48/0x60
[Mon Jun 24 20:07:01 2019] [] apic_timer_interrupt+0x162/0x170
[Mon Jun 24 20:07:01 2019] [] ? hrtimer_start_range_ns+0x1ed/0x3c0
[Mon Jun 24 20:07:01 2019] [] ? cpuidle_enter_state+0x57/0xd0
[Mon Jun 24 20:07:01 2019] [] ? cpuidle_enter_state+0x4d/0xd0
[Mon Jun 24 20:07:01 2019] [] cpuidle_idle_call+0xde/0x230
[Mon Jun 24 20:07:01 2019] [] arch_cpu_idle+0xe/0xc0
[Mon Jun 24 20:07:01 2019] [] cpu_startup_entry+0x14a/0x1e0
[Mon Jun 24 20:07:01 2019] [] rest_init+0x77/0x80
[Mon Jun 24 20:07:01 2019] [] start_kernel+0x44b/0x46c
[Mon Jun 24 20:07:01 2019] [] ? repair_env_string+0x5c/0x5c
[Mon Jun 24 20:07:01 2019] [] ? early_idt_handler_array+0x120/0x120
[Mon Jun 24 20:07:01 2019] [] x86_64_start_reservations+0x24/0x26
[Mon Jun 24 20:07:01 2019] [] x86_64_start_kernel+0x154/0x177
[Mon Jun 24 20:07:01 2019] [] start_cpu+0x5/0x14
[Mon Jun 24 20:07:01 2019] —[ end trace d2a01428c663f75b ]—
[Mon Jun 24 20:07:01 2019] ib0: transmit timeout: latency 26 msecs
[Mon Jun 24 20:07:01 2019] ib0: queue (5) stopped, tx_head 179121235, tx_tail 179121170
[Mon Jun 24 20:07:11 2019] ib0: transmit timeout: latency 7 msecs
If we take down ib0 and want to restart it, the whole server freezes and needs to be rebootet. Any ideas what could be the issue for that?
At the moment we cant use build in drivers from centos because they dont bring the performance we need.
Best regards,
Volker