Kernel Tainted when I Debugging Broadcom Driver

Hello, everyone:
I am debugging a wifi6 NIC driver, on the surface he works perfectly fine, but when I leave it for a long time or use iperf3 to test the bandwidth, he prints as below, but in the meantime, wlan is still working. This is causing me problems, is this an issue that needs to be addressed?

mr123@mr123-desktop:~$ ipearf3 -s

Server listening on 5201

Accepted connection from 10.6.16.16, port 53269
[ 5] local 10.6.60.254 port 5201 connected to 10.6.16.16 port 53270
[ ID] Interval Transfer Bandwidth
[ 5] 0.00-1.00 sec 5.41 MBytes 45.4 Mbits/sec
[ 5] 1.00-2.00 sec 2.03 MBytes 17.1 Mbits/sec
[ 5] 2.00-3.00 sec 1.39 MBytes 11.7 Mbits/sec
[ 5] 3.00-4.00 sec 1.03 MBytes 8.68 Mbits/sec
[ 5] 4.00-5.00 sec 1019 KBytes 8.35 Mbits/sec
[ 5] 5.00-6.00 sec 992 KBytes 8.13 Mbits/sec
[ 5] 6.00-7.00 sec 4.54 MBytes 38.1 Mbits/sec
[ 5] 7.00-8.00 sec 5.13 MBytes 43.0 Mbits/sec
[ 769.642582] BUG: scheduling while atomic: irq/454-dhdpcie/5977/0x00000002
[ 769.649372] Modules linked in: bcmdhd
[ 769.653057] CPU: 3 PID: 5977 Comm: irq/454-dhdpcie Tainted: G W 4.9.299-rt191+ #493
[ 769.661823] Hardware name: lanai-3636 (DT)
[ 769.665910] Call trace:
[ 769.668357] [] dump_backtrace+0x0/0x198
[ 769.673744] [] show_stack+0x24/0x30
[ 769.678788] [] dump_stack+0xa0/0xc4
[ 769.683828] [] __schedule_bug+0x64/0x80
[ 769.689216] [] __schedule+0x5ac/0x5f0
[ 769.694429] [] schedule+0x48/0xe0
[ 769.699297] [] rt_spin_lock_slowlock+0x22c/0x308
[ 769.705463] [] rt_spin_lock+0x50/0x60
[ 769.710678] [] queue_work_on+0x5c/0xf0
[ 769.716215] [] dhd_lb_rx_napi_dispatch+0x13c/0x2a0 [bcmdhd]
[ 769.723581] [] dhd_prot_process_msgbuf_rxcpl+0x56c/0x848 [bcmdhd]
[ 769.731431] [] dhdpci_bus_read_frames+0x1d8/0x3b0 [bcmdhd]
[ 769.738688] [] dhd_bus_dpc+0x2b4/0x4e8 [bcmdhd]
[ 769.744978] [] dhd_dpc+0x7c/0x128 [bcmdhd]
[ 769.750630] [] __tasklet_action.isra.5+0x7c/0x128
[ 769.756882] [] tasklet_action+0x44/0x50
[ 769.762270] [] do_current_softirqs+0x1cc/0x3c8
[ 769.768263] [] __local_bh_enable+0x5c/0x70
[ 769.773912] [] irq_forced_thread_fn+0x7c/0xa8
[ 769.779818] [] irq_thread+0x110/0x1b0
[ 769.785034] [] kthread+0xec/0xf0
[ 769.789814] [] ret_from_fork+0x10/0x30
[ 769.795193] softirq: huh, entered softirq 6 TASKLET ffffff80080b5968 with preempt_count 00000000, exited with ffffffff?
[ 769.806022] ------------[ cut here ]------------
[ 769.810636] WARNING: CPU: 3 PID: 5977 at kernel/sched/core.c:3540 migrate_enable+0x68/0x280
[ 769.818970] Modules linked in: bcmdhd
[ 769.822647]
[ 769.824135] CPU: 3 PID: 5977 Comm: irq/454-dhdpcie Tainted: G W 4.9.299-rt191+ #493
[ 769.832902] Hardware name: lanai-3636 (DT)
[ 769.836988] task: ffffffc0eba01e00 task.stack: ffffffc0cfecc000
[ 769.842895] PC is at migrate_enable+0x68/0x280
[ 769.847330] LR is at __local_bh_enable+0x48/0x70
[ 769.851937] pc : [] lr : [] pstate: 60400045
[ 769.859317] sp : ffffffc0cfecfd20
[ 769.862624] x29: ffffffc0cfecfd20 x28: 0000000000000000
[ 769.867947] x27: 0000000000000000 x26: 0000000000000000
[ 769.873269] x25: ffffff800a0a798c x24: ffffff800811c000
[ 769.878592] x23: ffffff800811c6e8 x22: ffffffc0d68e6400
[ 769.883913] x21: 0000000000000001 x20: ffffffc0d695ae00
[ 769.889234] x19: ffffffc0eba01e00 x18: 0000000000000010
[ 769.894555] x17: 0000000000000000 x16: ffffffc0eba01e00
[ 769.899876] x15: ffffffffffffffff x14: 7270206874697720
[ 769.905198] x13: 3836393562303830 x12: 0000000000000000
[ 769.910520] x11: 00000000000002f0 x10: 000000000003a000
[ 769.915841] x9 : 0000000000000000 x8 : 0000000000000000
[ 769.921162] x7 : 0000000000000002 x6 : 00000000000001e1
[ 769.926482] x5 : 0000000000000000 x4 : 0000000000000000
[ 769.931804] x3 : 00000040ece7c000 x2 : ffffffc0f66cba90
[ 769.937127] x1 : 0000000000000001 x0 : ffffff800a0a797b
[ 769.942448]
[ 769.943933] —[ end trace 0000000000000003 ]—
[ 769.948539] Call trace:
[ 769.950980] [] migrate_enable+0x68/0x280
[ 769.956455] [] __local_bh_enable+0x48/0x70
[ 769.962103] [] irq_forced_thread_fn+0x7c/0xa8
[ 769.968010] [] irq_thread+0x110/0x1b0
[ 769.973226] [] kthread+0xec/0xf0
[ 769.978005] [] ret_from_fork+0x10/0x30
[ 769.985038] ------------[ cut here ]------------
[ 769.989657] WARNING: CPU: 3 PID: 2094 at kernel/cpu.c:354 unpin_current_cpu+0x5c/0x68
[ 769.997471] Modules linked in: bcmdhd
[ 770.001147]
[ 770.002634] CPU: 3 PID: 2094 Comm: kworker/3:2 Tainted: G W 4.9.299-rt191+ #493
[ 770.011053] Hardware name: lanai-3636 (DT)
[ 770.015145] Workqueue: events rtcpu_trace_worker
[ 770.019763] task: ffffffc0eecf9e00 task.stack: ffffffc0ea578000
[ 770.025670] PC is at unpin_current_cpu+0x5c/0x68
[ 770.030278] LR is at migrate_enable+0x1dc/0x280
[ 770.034797] pc : [] lr : [] pstate: 60400045
[ 770.042175] sp : ffffffc0ea57bca0
[ 770.045481] x29: ffffffc0ea57bca0 x28: 0000000000000000
[ 770.050802] x27: 0000000000000000 x26: ffffffc0f66d4000
[ 770.056124] x25: ffffffc0eecf9e00 x24: 000000000000000d
[ 770.061444] x23: ffffffc0f0ddcc00 x22: 0000000000000040
[ 770.066765] x21: ffffffc0ed3b2088 x20: 0000000000000001
[ 770.072086] x19: ffffffc0eecf9e00 x18: 0000000000000010
[ 770.077408] x17: 0000000000000002 x16: 0000000000000000
[ 770.082728] x15: ffffffffffffffff x14: 0000000000000000
[ 770.088049] x13: 0000000000000000 x12: ffffff800bde1000
[ 770.093369] x11: 0000000000000001 x10: 0000000000000b10
[ 770.098689] x9 : ffffffc0ea57bbb0 x8 : ffffffc0eecfa970
[ 770.104010] x7 : ffffff8008f79798 x6 : 0000000005a3249d
[ 770.109332] x5 : 00000000000002f0 x4 : 0000000000000000
[ 770.114652] x3 : 00000040ece7c000 x2 : ffffffc0f66cba90
[ 770.119974] x1 : 0000000000000000 x0 : ffffff800984fa90
[ 770.125295]
[ 770.126780] —[ end trace 0000000000000004 ]—
[ 770.131387] Call trace:
[ 770.133828] [] unpin_current_cpu+0x5c/0x68
[ 770.139473] [] migrate_enable+0x1dc/0x280
[ 770.145034] [] queue_delayed_work_on+0xb4/0x100
[ 770.151113] [] rtcpu_trace_worker+0x3c/0x48
[ 770.156847] [] process_one_work+0x1cc/0x4b0
[ 770.162580] [] worker_thread+0x160/0x510
[ 770.168053] [] kthread+0xec/0xf0
[ 770.172834] [] ret_from_fork+0x10/0x30
[ 770.179490] ------------[ cut here ]------------
[ 770.184106] WARNING: CPU: 3 PID: 5977 at kernel/cpu.c:354 unpin_current_cpu+0x5c/0x68
[ 770.191918] Modules linked in: bcmdhd
[ 770.195593]
[ 770.197082] CPU: 3 PID: 5977 Comm: irq/454-dhdpcie Tainted: G W 4.9.299-rt191+ #493
[ 770.205848] Hardware name: lanai-3636 (DT)
[ 770.209935] task: ffffffc0eba01e00 task.stack: ffffffc0cfecc000
[ 770.215841] PC is at unpin_current_cpu+0x5c/0x68
[ 770.220447] LR is at migrate_enable+0x1dc/0x280
[ 770.224967] pc : [] lr : [] pstate: 60400045
[ 770.232346] sp : ffffffc0cfecfd10
[ 770.235650] x29: ffffffc0cfecfd10 x28: 0000000000000000
[ 770.240972] x27: 0000000000000000 x26: 0000000000000000
[ 770.246293] x25: ffffff800a0a798c x24: ffffff800811c000
[ 770.251615] x23: ffffff800811c6e8 x22: ffffffc0d68e6400
[ 770.256937] x21: 0000000000000001 x20: ffffffc0d695ae00
[ 770.262258] x19: ffffffc0eba01e00 x18: 0000000000000010
[ 770.267581] x17: 0000000000000000 x16: ffffffc0eba01e00
[ 770.272902] x15: ffffffffffffffff x14: 0000000000000000
[ 770.278222] x13: 0000000000000000 x12: 0000000000000013
[ 770.283545] x11: 000000000000003d x10: 00000000000078a7
[ 770.288866] x9 : 0000000000000040 x8 : 0000000000000000
[ 770.294187] x7 : ffffffc0ee4c4000 x6 : ffffffc0d695ae00
[ 770.299510] x5 : 0000000000000007 x4 : 0000000000000000
[ 770.304830] x3 : 00000040ece7c000 x2 : ffffffc0f66cba90
[ 770.310152] x1 : 0000000000000000 x0 : ffffff800984fa90
[ 770.315472]
[ 770.316957] —[ end trace 0000000000000005 ]—
[ 770.321563] Call trace:
[ 770.324003] [] unpin_current_cpu+0x5c/0x68
[ 770.329649] [] migrate_enable+0x1dc/0x280
[ 770.335210] [] __local_bh_enable+0x48/0x70
[ 770.340858] [] irq_forced_thread_fn+0x7c/0xa8
[ 770.346765] [] irq_thread+0x110/0x1b0
[ 770.351979] [] kthread+0xec/0xf0
[ 770.356759] [] ret_from_fork+0x10/0x30
[ 770.362071] ------------[ cut here ]------------
[ 770.366683] WARNING: CPU: 3 PID: 5977 at kernel/cpu.c:354 unpin_current_cpu+0x5c/0x68
[ 770.374494] Modules linked in: bcmdhd
[ 770.378169]
[ 770.379656] CPU: 3 PID: 5977 Comm: irq/454-dhdpcie Tainted: G W 4.9.299-rt191+ #493
[ 770.388421] Hardware name: lanai-3636 (DT)
[ 770.392508] task: ffffffc0eba01e00 task.stack: ffffffc0cfecc000
[ 770.398415] PC is at unpin_current_cpu+0x5c/0x68
[ 770.403022] LR is at migrate_enable+0x1dc/0x280
[ 770.407542] pc : [] lr : [] pstate: 60400045
[ 770.414920] sp : ffffffc0cfecfd50
[ 770.418226] x29: ffffffc0cfecfd50 x28: 0000000000000000
[ 770.423545] x27: 0000000000000000 x26: 0000000000000000
[ 770.428866] x25: ffffff800a0a798c x24: ffffff800811c000
[ 770.434188] x23: ffffff800811c6e8 x22: ffffffc0d68e6400
[ 770.439509] x21: ffffffc0d695ae00 x20: 0000000000000001
[ 770.444830] x19: ffffffc0eba01e00 x18: 0000000000000010
[ 770.450153] x17: 0000000000000000 x16: ffffffc0eba01e00
[ 770.455473] x15: ffffffffffffffff x14: 0000000000000000
[ 770.460795] x13: 0000000000000000 x12: 0000000000000013
[ 770.466115] x11: 000000000000003d x10: 00000000000078a7
[ 770.471438] x9 : 0000000000000040 x8 : 0000000000000000
[ 770.476759] x7 : ffffffc0ee4c4000 x6 : ffffffc0d695ae00
[ 770.482081] x5 : 0000000000000002 x4 : 0000000000000022
[ 770.487401] x3 : 00000040ece7c000 x2 : ffffffc0f66cba90
[ 770.492721] x1 : 0000000000000000 x0 : ffffff800984fa90 [ 5]
8.00-9.0[ 770.498752]
0 sec [ 770.500929] —[ end trace 0000000000000006 ]—
1.26 MB[ 770.506225] Call trace:
ytes 10[ 770.509355] [] unpin_current_cpu+0x5c/0x68
.6 Mbits[ 770.515690] [] migrate_enable+0x1dc/0x280
/sec [ 770.521938] [] irq_thread+0x134/0x1b0
[ 770.527841] [] kthread+0xec/0xf0
[ 770.533311] [] ret_from_fork+0x10/0x30

[ 770.540532] ------------[ cut here ]------------
[ 770.545146] WARNING: CPU: 3 PID: 5977 at kernel/cpu.c:354 unpin_current_cpu+0x5c/0x68
[ 770.552960] Modules linked in: bcmdhd
[ 770.556635]
[ 770.558123] CPU: 3 PID: 5977 Comm: irq/454-dhdpcie Tainted: G W 4.9.299-rt191+ #493
[ 770.566889] Hardware name: lanai-3636 (DT)
[ 770.570976] task: ffffffc0eba01e00 task.stack: ffffffc0cfecc000
[ 770.576883] PC is at unpin_current_cpu+0x5c/0x68
[ 770.581490] LR is at migrate_enable+0x1dc/0x280
[ 770.586009] pc : [] lr : [] pstate: 60400045
[ 770.593388] sp : ffffffc0cfecfd10
[ 770.596695] x29: ffffffc0cfecfd10 x28: 0000000000000000
[ 770.602016] x27: 0000000000000000 x26: 0000000000000000
[ 770.607333] x25: ffffff800a0a798c x24: ffffff800811c000
[ 770.612653] x23: ffffff800811c6e8 x22: ffffffc0d68e6400
[ 770.617971] x21: 0000000000000001 x20: ffffffc0d695ae00
[ 770.623292] x19: ffffffc0eba01e00 x18: 0000000000000010
[ 770.628614] x17: 0000000000000000 x16: ffffffc0eba01e00
[ 770.633935] x15: ffffffffffffffff x14: 0000000000000000
[ 770.639256] x13: 0000000000000000 x12: 0000000000000013
[ 770.644576] x11: 006000014fa01f4b x10: 00000000001b4096
[ 770.649897] x9 : 0000000000000040 x8 : 0000000000000000
[ 770.655217] x7 : ffffffc0ee4c4000 x6 : ffffffc0d695ae00
[ 770.660540] x5 : ffffffc0d695ae00 x4 : 0000000000000000
[ 770.665861] x3 : 00000040ece7c000 x2 : ffffffc0f66cba90
[ 770.671182] x1 : 0000000000000000 x0 : ffffff800984fa90
[ 770.676501]
[ 770.677985] —[ end trace 0000000000000007 ]—
[ 770.682592] Call trace:
[ 770.685032] [] unpin_current_cpu+0x5c/0x68
[ 770.690678] [] migrate_enable+0x1dc/0x280
[ 770.696238] [] __local_bh_enable+0x48/0x70
[ 770.701887] [] irq_forced_thread_fn+0x7c/0xa8
[ 770.707794] [] irq_thread+0x110/0x1b0
[ 770.713008] [] kthread+0xec/0xf0
[ 770.717788] [] ret_from_fork+0x10/0x30
[ 770.723112] ------------[ cut here ]------------
[ 770.727723] WARNING: CPU: 3 PID: 5977 at kernel/cpu.c:354 unpin_current_cpu+0x5c/0x68
[ 770.735536] Modules linked in: bcmdhd
[ 770.739210]
[ 770.740696] CPU: 3 PID: 5977 Comm: irq/454-dhdpcie Tainted: G W 4.9.299-rt191+ #493
[ 770.749461] Hardware name: lanai-3636 (DT)
[ 770.753548] task: ffffffc0eba01e00 task.stack: ffffffc0cfecc000
[ 770.759453] PC is at unpin_current_cpu+0x5c/0x68
[ 770.764059] LR is at migrate_enable+0x1dc/0x280
[ 770.768578] pc : [] lr : [] pstate: 60400045
[ 770.775957] sp : ffffffc0cfecfd50
[ 770.779263] x29: ffffffc0cfecfd50 x28: 0000000000000000
[ 770.784584] x27: 0000000000000000 x26: 0000000000000000
[ 770.789906] x25: ffffff800a0a798c x24: ffffff800811c000
[ 770.795228] x23: ffffff800811c6e8 x22: ffffffc0d68e6400
[ 770.800548] x21: ffffffc0d695ae00 x20: 0000000000000001
[ 770.805869] x19: ffffffc0eba01e00 x18: 00000000fffffffc
[ 770.811190] x17: 0000000000000000 x16: ffffffc0eba01e00
[ 770.816512] x15: 0000000000000007 x14: 0000000000000001
[ 770.821835] x13: 0000000000000019 x12: 0000000000000033
[ 770.827157] x11: 0000000000000039 x10: 00000000000000ac
[ 770.832479] x9 : 0000000000000000 x8 : 000000000000b215
[ 770.837801] x7 : 00000000000002f0 x6 : 0000000000000000
[ 770.843123] x5 : 0000000000000010 x4 : 00000040ece7c000
[ 770.848445] x3 : 00000040ece7c000 x2 : ffffffc0f66cba90
[ 770.853765] x1 : 0000000000000000 x0 : ffffff800984fa90
[ 770.859086]
[ 770.860572] —[ end trace 0000000000000008 ]—
[ 770.865178] Call trace:
[ 770.867618] [] unpin_current_cpu+0x5c/0x68
[ 770.873264] [] migrate_enable+0x1dc/0x280
[ 770.878823] [] irq_thread+0x134/0x1b0
[ 770.884037] [] kthread+0xec/0xf0
[ 770.888817] [] ret_from_fork+0x10/0x30
log (2).txt (14.2 MB)

FYI, a crash dump is important, but the “tainted” part just says there is some non-GPL licensed kernel code. This, in turn, can be important if debugging requires that code to be in place for debugging, and perhaps that code is not available. It is sort of a “note” before looking at the actual debug information.

The call stack frame tends to say there might be some sort of deadlock or issue with code that takes place in an “atomic” section (if “atomic” it means you won’t be context switching and exiting that code until that code completes, but if it is not responding in that code section, it might never be able to complete). This is quite possibly a design issue, but to clarify, the code could possibly fail regardless of being in an uninterruptible section or not, but as a result of being in a spinlock (which is uninterruptible) this is a special failure: It blocks other use of that CPU and basically cannot continue. The stack frame from the first stack frame has some notes:

  • I see this is in a soft IRQ. A software IRQ is one which does not need to talk directly to hardware. An example might be computation of a checksum. Restated, this is probably something working on data rather than hardware I/O.
  • I don’t know what dhd is, but code for the dhd (there are several functions naming dhd, and I have no idea what they do) is basically halting. That halted code is bound to a “realtime” (it is soft realtime, not hard realtime) spin lock is never completing since the work within that is failing to either stop or return.

There is some further stack frame content also suggesting “realtime” (the “rt_”) is in some way a problem. Being able to migrate cores should be able to occur for a software IRQ, but it seems migration might also be failing. Inside of a spinlock which is atomic an attempt to migrate is no different than the spinlock getting stuck with code that does not return. The code cannot migrate to a new core until that atomic section is completed.

Anything trying to run in “realtime” via the “rt_” code probably has requirements that it be interruptible in a more predictable and controlled timing. I know nothing about your code, but maybe without the realtime part the code would still work, but with the rt it might be responding too slowly. Generally speaking though, I have to think whatever runs in that spinlock has an actual bug and it is likely more than just being slow. I don’t know.

The “tainted” though is just a statement of the code not being GPL and thus perhaps not debuggable if there is no source code available.

1 Like

Hi,
Since Jetpack 4 is with Kernel 4.9, would check with the vendor whether the driver is supported ad validated on the kernel version.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.