Jetsen TK1 stops responding with SSH connection

Hi,

I am connecting to my TK1 through SSH (no HDMI output) and doing some commands causes the whole board to freeze or crash.

For instance trying to install or download software, it starts and then my SSH connection to the device stops working. When this happens I can not even ping the device any more.

This also happens every time I try to make a second SSH connection to the device.

Does anybody have some advice regarding this?

Some more information:
Plugging in an HDMI cable resolves the issue.

  • I can download the software that was giving the issue
  • I can make multiple SSH connections

Any help please,
I need to deploy 4 of these boards and will not be able to connect HDMI cables to each of them.

PS: I am using the latest developer packages for the TK1

Could you connect a serial cable to the UART-port and see if the kernel prints something when it hangs?

Busy looking into connecting to UART now…

In the mean time: I connected the HDMI again and KB and did the following:

  • Restarted the SSH service with no luck to get it up again
  • ifdown eth1 went through fine
  • ifup eth1 gave the following error:
RTNETLINK answers: Cannot allocate memory
Failed to bring up eth1
  • Second try of ifup eth1 gave:
RTNETLINK answers: File exists
Failed to bring up eth1

From this my initial assumption that the board freeze is wrong, it seems like the Ethernet connection breaks. Are there any log files I can look at if the UART does not show anything?

Connecting over UART does not give anything more than what I can see after connecting the HDMI back again. No messages when it crash, nothing.

Under the UART (serial port) you can watch logs before/during/after failure. If various logs in /var/log/ do not have what you need, you might update file “/etc/rsyslog.d/50-default.conf” by uncommenting the “catch-all” log files:

#
# Some "catch-all" log files.
#
*.=debug;\
        auth,authpriv.none;\
        news.none;mail.none     -/var/log/debug
*.=info;*.=notice;*.=warn;\
        auth,authpriv.none;\
        cron,daemon.none;\
        mail,news.none          -/var/log/messages

It would also be useful to know which version of L4T you have (it ships with R19.2).

Setting dmesg level to 7+ will give logs on UART itself:
dmesg -n 7 # run as root

Set this before reproducing the issue.

Let us know if you are getting any timeout message through netdev watchdog on n/w driver r8169:
dmesg -x | grep -ie r8169 # check after repro steps on encountering issue

Thanks
Mohit Sharma

Ok, I have done as requested and reproducing gives me the following on the console after a few seconds (after the error occurred):

[ 3853.071292] ------------[ cut here ]------------
[ 3853.134811] WARNING: at /dvs/git/dirty/git-master_linux/kernel/net/sched/sch_generic.c:255 dev_watchdog+0x260/0x280()
[ 3853.265442] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
[ 3853.330929] Modules linked in: dm_crypt dm_mod rfcomm bnep bluetooth rfkill nvhost_vi
[ 3853.360075] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.10.40-g8c4516e #1
[ 3853.376903] [<c00168e0>] (unwind_backtrace+0x0/0x140) from [<c0013234>] (show_stack+0x18/0x1c)
[ 3853.405116] [<c0013234>] (show_stack+0x18/0x1c) from [<c0067150>] (warn_slowpath_common+0x54/0x70)
[ 3853.433701] [<c0067150>] (warn_slowpath_common+0x54/0x70) from [<c0067218>] (warn_slowpath_fmt+0x38/0x48)
[ 3853.462880] [<c0067218>] (warn_slowpath_fmt+0x38/0x48) from [<c06d6090>] (dev_watchdog+0x260/0x280)
[ 3853.491548] [<c06d6090>] (dev_watchdog+0x260/0x280) from [<c0077984>] (call_timer_fn+0x44/0x15c)
[ 3853.519910] [<c0077984>] (call_timer_fn+0x44/0x15c) from [<c0077d7c>] (run_timer_softirq+0x218/0x2b8)
[ 3853.548759] [<c0077d7c>] (run_timer_softirq+0x218/0x2b8) from [<c0070130>] (__do_softirq+0xf4/0x2a0)
[ 3853.577492] [<c0070130>] (__do_softirq+0xf4/0x2a0) from [<c0070394>] (do_softirq+0x54/0x60)
[ 3853.605418] [<c0070394>] (do_softirq+0x54/0x60) from [<c0070644>] (irq_exit+0x98/0xd0)
[ 3853.632884] [<c0070644>] (irq_exit+0x98/0xd0) from [<c000fb78>] (handle_IRQ+0x44/0x98)
[ 3853.660342] [<c000fb78>] (handle_IRQ+0x44/0x98) from [<c00084f4>] (gic_handle_irq+0x40/0x160)
[ 3853.688468] [<c00084f4>] (gic_handle_irq+0x40/0x160) from [<c000ed40>] (__irq_svc+0x40/0x70)
[ 3853.716719] Exception stack(0xc0bafeb8 to 0xc0baff00)
[ 3853.731731] fea0:                                                       c0baff10 00000000
[ 3853.759206] fec0: 00000000 000f4240 c1cc27c0 c1cc03e8 c0d027fc 00000001 00000a2e 00000000
[ 3853.786511] fee0: c0baff08 c0811520 3b9ac9ff c0baff00 c02bbb94 c003df48 20070013 ffffffff
[ 3853.813938] [<c000ed40>] (__irq_svc+0x40/0x70) from [<c003df48>] (tegra_idle_enter_pd+0x11c/0x260)
[ 3853.842220] [<c003df48>] (tegra_idle_enter_pd+0x11c/0x260) from [<c05a827c>] (cpuidle_enter_state+0x48/0x104)
[ 3853.871476] [<c05a827c>] (cpuidle_enter_state+0x48/0x104) from [<c05a8490>] (cpuidle_idle_call+0x158/0x298)
[ 3853.900552] [<c05a8490>] (cpuidle_idle_call+0x158/0x298) from [<c0010118>] (arch_cpu_idle+0x10/0x40)
[ 3853.929013] [<c0010118>] (arch_cpu_idle+0x10/0x40) from [<c00b42c4>] (cpu_idle_loop+0x9c/0x23c)
[ 3853.956976] [<c00b42c4>] (cpu_idle_loop+0x9c/0x23c) from [<c0b2ea28>] (start_kernel+0x2c4/0x318)
[ 3853.984982] ---[ end trace 7979415387afd3f9 ]---
[ 3854.007192] r8169 0000:01:00.0 eth0: link up

Running dmesg -x | grep -ie r8169 gives me:

kern  :info  : [    4.438805] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
kern  :info  : [    4.448271] r8169 0000:01:00.0 eth0: RTL8168g/8111g at 0xf0000000, 00:04:4b:25:b7:c9, XID 0c000800 IRQ 641
kern  :info  : [    4.460478] r8169 0000:01:00.0 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko]
kern  :info  : [    9.762569] r8169 0000:01:00.0 eth0: link down
kern  :info  : [    9.762643] r8169 0000:01:00.0 eth0: link down
kern  :info  : [   12.766019] r8169 0000:01:00.0 eth0: link up
kern  :info  : [ 3853.265442] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
kern  :info  : [ 3854.007192] r8169 0000:01:00.0 eth0: link up

This specific time I was trying to install a new package through apt-get

How do I check which version of L4T is installed?

I am pretty certain it is the latest, R21.2, but would like to make sure.

This same issue with 21.2

Version of L4T via “head -n 1 /etc/nv_tegra_release”.

That “NETDEV WATCHDOG” is the known gigabit issue listed in release notes. This is not specific to L4T, but is apparently related to a general driver issue. I keep wanting to figure out which kernel fixed this issue, but have not found the answer. This is also why I’m currently still working with R19.3. Release notes at:
http://developer.download.nvidia.com/mobile/tegra/l4t/r21.2.0/Tegra_Linux_Driver_Package_Release_Notes_R21.2.pdf

One of the solutions was to force the ethernet back to 100 Mbit via mii-tool.

Yes it is the same issue as I have raised or there is other similar topics if you search:

https://devtalk.nvidia.com/default/topic/794758/embedded-systems/jetson-tk1-r8169-ethernet-cuts-on-high-load/

My desktop pc sabertooth 990fx r2.0 using the same driver

0a:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 09)
	Subsystem: ASUSTeK Computer Inc. P8H77-I Motherboard
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 97
	Region 0: I/O ports at b000 
	Region 2: Memory at d0004000 (64-bit, prefetchable) 
	Region 4: Memory at d0000000 (64-bit, prefetchable) 
	Capabilities: <access denied>
	Kernel driver in use: r8169
uname -r
3.13.0-40-generic

I didnt have any problem on desktop so far

I have also tried 21.2 kernel with watchdog timer, pic aspm, cpu idle, cpuqiet, power management disabled. Tempreture was ok but the issue still persist.

I recommend my gigabit ethernet solution:
https://devtalk.nvidia.com/default/topic/799075/embedded-systems/jetson-tk1-r8169-netdev-watchdog-timeout-solved-/