Netdev watchdog TX2 eth1

We use TX2 with the BSP package L4T_r2810. The project information is following:
8G RAM
32G EMMC
Ubuntu 16.4 with kinetic-ros

And we desinged the base board to connect more usb devices (three usb2.0 interfaces,one usb3.0,one usb eth,one usb3.0 4-hub).

We use the usb-eth(mean eth1) chip RTL8153 which connects with TX2 usb ss1.
And the driver of RTL8153 is from its offical website. But in our project sometimes the NETDEV WATCHDOG occurs in eth1 then the whole usb host controllers die. In addition the issue can reproduce in both RTL8153 driver version (v2.10.00 (2018/03/16), v2.12.0 (2019/04/29)). I can’t find the usb err from the kernel. Please give some direction to investigate this issue. Thanks

Jul 30 22:06:41 tegra-ubuntu kernel: [26369.824523] NETDEV WATCHDOG: eth1 (r8152): transmit queue 0 timed out
Jul 30 22:06:41 tegra-ubuntu kernel: [26369.824548] ------------[ cut here ]------------
Jul 30 22:06:41 tegra-ubuntu kernel: [26369.824552] WARNING: at ffffffc0009b3d3c [verbose debug info unavailable]
Jul 30 22:06:41 tegra-ubuntu kernel: [26369.824555] Modules linked in: cp210x fuse rndis_host uvcvideo videobuf2_vmalloc bcmdhd pci_tegra bluedroid_pm
Jul 30 22:06:41 tegra-ubuntu kernel: [26369.824568]
Jul 30 22:06:41 tegra-ubuntu kernel: [26369.824573] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.38 #1
Jul 30 22:06:41 tegra-ubuntu kernel: [26369.824576] Hardware name: quill (DT)
Jul 30 22:06:41 tegra-ubuntu kernel: [26369.824578] task: ffffffc0011f4ec0 ti: ffffffc0011e4000 task.ti: ffffffc0011e4000
Jul 30 22:06:41 tegra-ubuntu kernel: [26369.824586] PC is at dev_watchdog+0x2ac/0x2bc
Jul 30 22:06:41 tegra-ubuntu kernel: [26369.824589] LR is at dev_watchdog+0x2ac/0x2bc
Jul 30 22:06:41 tegra-ubuntu kernel: [26369.824592] pc : [] lr : [] pstate: 60000045
Jul 30 22:06:41 tegra-ubuntu kernel: [26369.824594] sp : ffffffc0011e7b40
Jul 30 22:06:41 tegra-ubuntu kernel: [26369.824597] x29: ffffffc0011e7b40 x28: ffffffc06bd193b8
Jul 30 22:06:41 tegra-ubuntu kernel: [26369.824601] x27: ffffffc0011bbab8 x26: 0000000000000280
Jul 30 22:06:41 tegra-ubuntu kernel: [26369.824605] x25: 00000000ffffffff x24: 0000000000000000
Jul 30 22:06:41 tegra-ubuntu kernel: [26369.824608] x23: ffffffc06bd193a0 x22: ffffffc001366000
Jul 30 22:06:41 tegra-ubuntu kernel: [26369.824611] x21: ffffffc1e0fb2400 x20: ffffffc06bd19000
Jul 30 22:06:41 tegra-ubuntu kernel: [26369.824614] x19: ffffffc0011ea000 x18: 0000000000000a03
Jul 30 22:06:41 tegra-ubuntu kernel: [26369.824617] x17: 0000007f92133f68 x16: ffffffc0001d77c0
Jul 30 22:06:41 tegra-ubuntu kernel: [26369.824620] x15: 001dcd6500000000 x14: 0ffffffffffffffe
Jul 30 22:06:41 tegra-ubuntu kernel: [26369.824623] x13: 0000000000000028 x12: ffffffc001204000
Jul 30 22:06:41 tegra-ubuntu kernel: [26369.824626] x11: 0000000000000006 x10: 0000000000000000
Jul 30 22:06:41 tegra-ubuntu kernel: [26369.824629] x9 : 000000000000045d x8 : 756575712074696d
Jul 30 22:06:41 tegra-ubuntu kernel: [26369.824632] x7 : 0000000000000000 x6 : ffffffc0013a4438
Jul 30 22:06:41 tegra-ubuntu kernel: [26369.824634] x5 : 0000000000000000 x4 : 0000000000000000
Jul 30 22:06:41 tegra-ubuntu kernel: [26369.824637] x3 : 0000000000000000 x2 : 0000000000000102
Jul 30 22:06:41 tegra-ubuntu kernel: [26369.824640] x1 : ffffffc0011e4000 x0 : 0000000000000039
Jul 30 22:06:41 tegra-ubuntu kernel: [26369.824642]
Jul 30 22:06:41 tegra-ubuntu kernel: [26369.824645] —[ end trace cd74197b3ae6af7a ]—
Jul 30 22:06:41 tegra-ubuntu kernel: [26369.824648] Call trace:
Jul 30 22:06:41 tegra-ubuntu kernel: [26369.827092] [] dev_watchdog+0x2ac/0x2bc
Jul 30 22:06:41 tegra-ubuntu kernel: [26369.827101] [] call_timer_fn+0x50/0x1bc
Jul 30 22:06:41 tegra-ubuntu kernel: [26369.827105] [] run_timer_softirq+0x1ac/0x2a4
Jul 30 22:06:41 tegra-ubuntu kernel: [26369.827112] [] __do_softirq+0x10c/0x368
Jul 30 22:06:41 tegra-ubuntu kernel: [26369.827116] [] irq_exit+0x84/0xdc
Jul 30 22:06:41 tegra-ubuntu kernel: [26369.827122] [] __handle_domain_irq+0x6c/0xb4
Jul 30 22:06:41 tegra-ubuntu kernel: [26369.827126] [] gic_handle_irq+0x5c/0xb4
Jul 30 22:06:41 tegra-ubuntu kernel: [26369.827130] [] el1_irq+0x80/0xf8
Jul 30 22:06:41 tegra-ubuntu kernel: [26369.827137] [] cpuidle_enter+0x18/0x20
Jul 30 22:06:41 tegra-ubuntu kernel: [26369.827141] [] call_cpuidle+0x28/0x50
Jul 30 22:06:41 tegra-ubuntu kernel: [26369.827143] [] cpu_startup_entry+0x17c/0x340
Jul 30 22:06:41 tegra-ubuntu kernel: [26369.827150] [] rest_init+0x84/0x8c
Jul 30 22:06:41 tegra-ubuntu kernel: [26369.827157] [] start_kernel+0x39c/0x3b0
Jul 30 22:06:41 tegra-ubuntu kernel: [26369.827160] [<0000000080b18000>] 0x80b18000
Jul 30 22:06:41 tegra-ubuntu kernel: [26369.827176] r8152 2-3:1.0 eth1: Tx timeout
Jul 30 22:06:42 tegra-ubuntu kernel: [26370.960515] xhci-tegra 3530000.xhci: xHCI host not responding to stop endpoint command.
Jul 30 22:06:42 tegra-ubuntu kernel: [26370.960524] xhci-tegra 3530000.xhci: Assuming host is dying, halting host.

xiaoba-20190730.tar (4.46 MB)

Would you mind using the pure ubuntu from jetpack to reproduce this issue?

This is resulted by the sr300 camera device.Just after reseting the SR300 devices and 5 seconds the usb host controller died . How to debug the usb issue?

realsense2_cam.log (20.3 MB)

Are you using all the environment that is purely from jetpack?

I mean is ros a necessary part here?

Yes, we use ros for robot machine.

Is ros installing any kernel patch? or just user-space framework?

For us, it is better debugging on devkit with pure image from jetpack.

I mean this is a usb issue . Please help to check!

Hi,
We can see some error printed:

Jul 29 09:52:07 tegra-ubuntu kernel: [  390.309568] uvcvideo: Non-zero status (-71) in video completion handler.
Jul 29 09:52:07 tegra-ubuntu kernel: [  390.326735] uvcvideo: Non-zero status (-71) in video completion handler.
Jul 29 09:52:07 tegra-ubuntu kernel: [  390.354676] xhci-tegra 3530000.xhci: tegra_xhci_mbox_work mailbox command 6

But not clear why it is triggered. Will check with other team on how to do further debugging.
For double confirmation, L4T_2810 should be r28.1?

Yes ,the bsp package is L4T_r2810.

In another device the issue is reproduced and we can see the usb err such as:

Aug 12 13:14:11 tegra-ubuntu kernel: [ 8760.155192] tegra-xusb-mbox 3538000.mailbox: Controller firmware hang
Aug 12 13:14:11 tegra-ubuntu kernel: [ 8760.161629] tegra-xusb-mbox 3538000.mailbox: XUSB_CFG_ARU_MBOX_OWNER 0x0
Aug 12 13:14:11 tegra-ubuntu kernel: [ 8760.168318] tegra-xusb-mbox 3538000.mailbox: XUSB_CFG_ARU_MBOX_CMD 0x80000000
Aug 12 13:14:11 tegra-ubuntu kernel: [ 8760.175443] tegra-xusb-mbox 3538000.mailbox: XUSB_CFG_ARU_MBOX_DATA_IN 0x0
Aug 12 13:14:11 tegra-ubuntu kernel: [ 8760.182305] tegra-xusb-mbox 3538000.mailbox: XUSB_CFG_ARU_MBOX_DATA_OUT 0x6000291
Aug 12 13:14:11 tegra-ubuntu kernel: [ 8760.848254] NETDEV WATCHDOG: eth1 (r8152): transmit queue 0 timed out
Aug 12 13:14:11 tegra-ubuntu kernel: [ 8760.848279] ------------[ cut here ]------------
Aug 12 13:14:11 tegra-ubuntu kernel: [ 8760.848283] WARNING: at ffffffc0009b3eec [verbose debug info unavailable]
Aug 12 13:14:11 tegra-ubuntu kernel: [ 8760.848288] Modules linked in: rndis_host cp210x fuse uvcvideo videobuf2_vmalloc bcmdhd pci_tegra bluedroid_pm
Aug 12 13:14:11 tegra-ubuntu kernel: [ 8760.848304]
Aug 12 13:14:11 tegra-ubuntu kernel: [ 8760.848310] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.38 #1
Aug 12 13:14:11 tegra-ubuntu kernel: [ 8760.848313] Hardware name: quill (DT)
Aug 12 13:14:11 tegra-ubuntu kernel: [ 8760.848316] task: ffffffc0011f4ec0 ti: ffffffc0011e4000 task.ti: ffffffc0011e4000
Aug 12 13:14:11 tegra-ubuntu kernel: [ 8760.848324] PC is at dev_watchdog+0x2ac/0x2bc

Please help to check! Thanks

kern.log (3.47 MB)
20190812-gingerlog.tar (29.2 MB)

Hi,

According to your comments, there should be two issues with your customized carrier board.

issue 1. NETDEV WATCHDOG time out and then host controller died.
issue 2. Intel Realsense SR300 camera stopped to work.

Some questions:

  1. Does the issue#1 and issue#2 also happen on devkit? (Jetson TX2)
    If it is, we may need to get the same device to debug with bus trace and fw log.

2.Does other USB device work normally on the customized carrier board?
You can check through continuous recording with other webcam.

  1. Did you try signal test on USB or do the test with all wireless devices disabled on the board?