NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out with Tegra R21

Hi All,

When I upgraded to Linux for Tegra R21, I met a problem with the Ethernet. Please help!

[ 596.041039] ------------[ cut here ]------------
[ 596.041071] WARNING: at /dvs/git/dirty/git-master_linux/kernel/net/sched/sch_generic.c:255 dev_watchdog+0x260/0x280()
[ 596.041079] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
[ 596.041085] Modules linked in: dm_crypt dm_mod joydev rfcomm bnep bluetooth rfkill nvhost_vi
[ 596.041121] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.10.40-g8c4516e #1
[ 596.041150] [] (unwind_backtrace+0x0/0x140) from [] (show_stack+0x18/0x1c)
[ 596.041167] [] (show_stack+0x18/0x1c) from [] (warn_slowpath_common+0x54/0x70)
[ 596.041181] [] (warn_slowpath_common+0x54/0x70) from [] (warn_slowpath_fmt+0x38/0x48)
[ 596.041192] [] (warn_slowpath_fmt+0x38/0x48) from [] (dev_watchdog+0x260/0x280)
[ 596.041207] [] (dev_watchdog+0x260/0x280) from [] (call_timer_fn+0x44/0x15c)
[ 596.041217] [] (call_timer_fn+0x44/0x15c) from [] (run_timer_softirq+0x218/0x2b8)
[ 596.041229] [] (run_timer_softirq+0x218/0x2b8) from [] (__do_softirq+0xf4/0x2a0)
[ 596.041239] [] (__do_softirq+0xf4/0x2a0) from [] (do_softirq+0x54/0x60)
[ 596.041248] [] (do_softirq+0x54/0x60) from [] (irq_exit+0x98/0xd0)
[ 596.041260] [] (irq_exit+0x98/0xd0) from [] (handle_IRQ+0x44/0x98)
[ 596.041271] [] (handle_IRQ+0x44/0x98) from [] (gic_handle_irq+0x40/0x160)
[ 596.041282] [] (gic_handle_irq+0x40/0x160) from [] (__irq_svc+0x40/0x70)
[ 596.041289] Exception stack(0xc0bafec8 to 0xc0baff10)
[ 596.041296] fec0: c0baff18 00000000 00000000 000f4240 00000246 00000000
[ 596.041303] fee0: 00000000 c1cc27c0 c0baff10 c0ce91e0 c1cc03e8 c0811520 3b9ac9ff c0baff10
[ 596.041309] ff00: c02bbb94 c003e0f8 200f0013 ffffffff
[ 596.041322] [] (__irq_svc+0x40/0x70) from [] (tegra_idle_enter_clock_gating+0x6c/0x7c)
[ 596.041339] [] (tegra_idle_enter_clock_gating+0x6c/0x7c) from [] (cpuidle_enter_state+0x48/0x104)
[ 596.041352] [] (cpuidle_enter_state+0x48/0x104) from [] (cpuidle_idle_call+0x158/0x298)
[ 596.041362] [] (cpuidle_idle_call+0x158/0x298) from [] (arch_cpu_idle+0x10/0x40)
[ 596.041376] [] (arch_cpu_idle+0x10/0x40) from [] (cpu_idle_loop+0x9c/0x23c)
[ 596.041394] [] (cpu_idle_loop+0x9c/0x23c) from [] (start_kernel+0x2c4/0x318)
[ 596.041400] —[ end trace 3f2df31896b87288 ]—

Does it always do this, and does it do this immediately upon network use? What is your output (if you can get there) for “lsmod”? Also, what was your method of install…just a complete flash, or some form of “upgrade”? I haven’t personally installed R21.1 yet (hope to soon, working on other things first), so it’s difficult to debug (more information might help).

It is in this situation when I download a big package from internet by use wget. Normally, it is hang in several minutes.

dmesg shows transmit queue 0 timed out

I’m thinking that one of the known issues in release notes applies. According to R21.1 notes, “2.3 ONBOARD ETHERNET OCASDIONALLY REACHES TIMEOUT HIT NETDEV WATCHDOG TIMEOUT IN R8169 DRIVER”. The release notes go on to mention there is a known linux issue and reaching the NetDev watchdog timer might occur under heavy I/O, and that a workaround is to use a slower ethernet HUB, e.g., 100 Mbps. Implications are that this is purely a linux kernel issue within the R8169 driver.

I’m also thinking there may have been (or maybe not) a mainstream kernel fix for this already, and that if this is true it could be back-ported.

So far as the OOPS goes, I’m wondering if any of this is an intended response from a watchdog…some watchdogs are intended to be configurable to their response, such as forcing a reboot or forcing OOPS. For the moment I have to keep R19.3 on my system so I do not know what is in the /proc/config.gz file. Maybe someone with this file can post it, since some kernel compile options are related to watchdog timers and how they will behave. Disabling a watchdog timer could be a benefit if this is the case (it wouldn’t stop a network stall but would stop the watchdog response).

Seems this has been reported for i386 but I see no mention of ARMv7. Here’s a reference to this bug outside of L4T:

The report is quite old…unfortunately I do not see which kernel version managed to get around this issue. A lot has happened since the 3.10 days.

I had the same problem (hang during heavy network traffic).
After reducing the speed from 1000baseT to 100baseT, the board seems stable.

You can use mii-tool, no need for additional networking hardware.

Just add the line to /etc/rc.local:
sudo mii-tool -F 100baseTx-FD eth0

hi mfatica, it works! thank you and all.