Kernel panic while booting linux

Kernel panics while booting linux if mellanox card is connected to the network. It boots okay if I disconnect the card.

(after it successfully boots I can connect it to the network. though it sometime(not always) causes host to hang when I run ping over the network, for which I don’t have much details to post…)

Here are details on the system

uname -a

Linux 4.2.0-35-generic #40-Ubuntu SMP Tue Mar 15 22:15:45 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux


Querying Mellanox devices firmware ...

Device #``1``:


```Device Type: ConnectX3Pro`

```Part Number: MCX312B-XCC_Ax`

```Description: ConnectX-3` `Pro EN network interface card; 10GigE; dual-port SFP+; PCIe3.``0 x8 8GT/s; RoHS R6`

```PSID: MT_1200111023`

```PCI Device Name: 0000:02:``00.0`

```Port1 MAC: e41d2db25040`

```Port2 MAC: e41d2db25041`

```Versions: Current Available`

```FW 2.36.5000` `2.36.``5000`

```PXE 3.4.0718` `3.4.``0718`

```Status: Up to date`

Stack dump from crash(dmesg file is attached)

KERNEL: /usr/lib/debug/boot/vmlinux-4.2.0-35-generic DUMPFILE: …/201607301001/dump.201607301001 [PARTIAL DUMP] CPUS: 8 DATE: Sat Jul 30 10:01:52 2016 UPTIME: 00:00:14LOAD AVERAGE: 1.19, 0.25, 0.08 TASKS: 584 NODENAME: RELEASE: 4.2.0-35-generic VERSION: #40-Ubuntu SMP Tue Mar 15 22:15:45 UTC 2016 MACHINE: x86_64 (3409 Mhz) MEMORY: 16 GB PANIC: “BUG: unable to handle kernel paging request at 0000001100000002” PID: 1625 COMMAND: “docker” TASK: ffff8803e1f5a940 [THREAD_INFO: ffff8803de0e8000] CPU: 4 STATE: TASK_RUNNING (PANIC)crash> btPID: 1625 TASK: ffff8803e1f5a940 CPU: 4 COMMAND: “docker” #0 [ffff88041ed033f0] machine_kexec at ffffffff8105913b #1 [ffff88041ed03460] crash_kexec at ffffffff81109bf2 #2 [ffff88041ed03530] oops_end at ffffffff81018ead #3 [ffff88041ed03560] no_context at ffffffff810682a5 #4 [ffff88041ed035d0] __bad_area_nosemaphore at ffffffff81068570 #5 [ffff88041ed03620] bad_area_nosemaphore at ffffffff810686f3 #6 [ffff88041ed03630] __do_page_fault at ffffffff810689d7 #7 [ffff88041ed03690] do_page_fault at ffffffff81068d42 #8 [ffff88041ed036b0] page_fault at ffffffff817fabc8 [exception RIP: __netdev_pick_tx+102] RIP: ffffffff816e64e6 RSP: ffff88041ed03768 RFLAGS: 00010202 RAX: ffff88040c2d97f0 RBX: 0000000000000000 RCX: ffffffff816e6480 RDX: 000000000000000c RSI: ffff8803d4359b00 RDI: ffff8803fb440000 RBP: ffff88041ed037a8 R8: ffff88041ed19b00 R9: ffff8803d4359b00 R10: 0000000000000000 R11: 0000000000000150 R12: ffff8803fb440000 R13: 0000000000000000 R14: 00000000ffffffff R15: 0000001100000002 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 #9 [ffff88041ed037b0] mlx4_en_select_queue at ffffffffc0187a7f [mlx4_en]#10 [ffff88041ed037d0] netdev_pick_tx at ffffffff816edac1#11 [ffff88041ed03800] __dev_queue_xmit at ffffffff816edc07#12 [ffff88041ed03860] dev_queue_xmit_sk at ffffffff816ee0e3#13 [ffff88041ed03870] netdev_send at ffffffffc04de305 [openvswitch]#14 [ffff88041ed038b0] ovs_vport_send at ffffffffc04ddc28 [openvswitch]#15 [ffff88041ed038d0] do_output at ffffffffc04d0289 [openvswitch]#16 [ffff88041ed038f0] do_execute_actions at ffffffffc04d0874 [openvswitch]#17 [ffff88041ed039a0] ovs_execute_actions at ffffffffc04d177f [openvswitch]#18 [ffff88041ed039d0] ovs_dp_process_packet at ffffffffc04d4f04 [openvswitch]#19 [ffff88041ed03a60] ovs_vport_receive at ffffffffc04dd38b [openvswitch]#20 [ffff88041ed03c10] netdev_frame_hook at ffffffffc04de5d0 [openvswitch]#21 [ffff88041ed03c40] __netif_receive_skb_core at ffffffff816eb2d4#22 [ffff88041ed03ce0] __netif_receive_skb at ffffffff816eb988#23 [ffff88041ed03d00] netif_receive_skb_internal at ffffffff816eba02#24 [ffff88041ed03d40] napi_gro_frags at ffffffff816ec4a7#25 [ffff88041ed03d70] mlx4_en_process_rx_cq at ffffffffc0189870 [mlx4_en]#26 [ffff88041ed03e10] mlx4_en_poll_rx_cq at ffffffffc0189db6 [mlx4_en]#27 [ffff88041ed03e60] net_rx_action at ffffffff816ebf09#28 [ffff88041ed03ef0] __do_softirq at ffffffff81081131#29 [ffff88041ed03f60] irq_exit at ffffffff81081433#30 [ffff88041ed03f70] do_IRQ at ffffffff817fb878— —#31 [ffff8803de0ebf58] ret_from_intr at ffffffff817f97eb RIP: 000000000088d618 RSP: 000000c82024d118 RFLAGS: 00000202 RAX: 0000000073f84770 RBX: 0000000000000400 RCX: 0000000054423aca RDX: 0000000089ecd45f RSI: 000000c820542940 RDI: 000000c820544000 RBP: 00000000e4458357 R8: 000000008847594a R9: 0000000039eb6dc2 R10: 00000000d57b5eff R11: 00000000fa36c492 R12: 0000000000000004 R13: 0000000000dd5c19 R14: 0000000000000002 R15: 0000000000000008 ORIG_RAX: ffffffffffffff3d CS: 0033 SS: 002bcrash>

Has anyone seen the similar issue?

Check if the issue can be reproduced with latest Mellanox OFED. It might be interesting if you can also check if the issue still happen when most of services are disable on the host during the boot.

Also, try to disable LRO and GRO