4 Devices on USB via 4Port HUB : IC2 Error and System Freeze

We pluged 4 USB device behind a 4 Port USB , and after booting, few times after the 4 device are recognize on lsusb, we got the following log and the tegra cpu is totally frozen

tegra-i2c tegra12-i2c.4: — register dump for debugging ----
[ 249.971601] tegra-i2c tegra12-i2c.4: I2C_CNFG - 0x2c00
[ 249.977195] tegra-i2c tegra12-i2c.4: I2C_PACKET_TRANSFER_STATUS - 0x1ff0001
[ 249.984652] tegra-i2c tegra12-i2c.4: I2C_FIFO_CONTROL - 0xe0
[ 249.990721] tegra-i2c tegra12-i2c.4: I2C_FIFO_STATUS - 0x800080
[ 249.997063] tegra-i2c tegra12-i2c.4: I2C_INT_MASK - 0xec
[ 250.003120] tegra-i2c tegra12-i2c.4: I2C_INT_STATUS - 0xc2
[ 250.009020] tegra-i2c tegra12-i2c.4: msg->len - 2
[ 250.014080] tegra-i2c tegra12-i2c.4: is_msg_write - 1
[ 250.019506] tegra-i2c tegra12-i2c.4: buf_remaining - 0
[ 250.025035] tegra-i2c tegra12-i2c.4: i2c transfer timed out, addr 0x0040, data 0x01
[ 250.033480] Failed to set dvfs regulator vdd_core
[ 251.010206] tegra-i2c tegra12-i2c.4: — register dump for debugging ----
[ 251.017840] tegra-i2c tegra12-i2c.4: I2C_CNFG - 0x2c00
[ 251.023594] tegra-i2c tegra12-i2c.4: I2C_PACKET_TRANSFER_STATUS - 0x1ff0001
[ 251.031274] tegra-i2c tegra12-i2c.4: I2C_FIFO_CONTROL - 0xe0
[ 251.037535] tegra-i2c tegra12-i2c.4: I2C_FIFO_STATUS - 0x800080
[ 251.044305] tegra-i2c tegra12-i2c.4: I2C_INT_MASK - 0xec
[ 251.050207] tegra-i2c tegra12-i2c.4: I2C_INT_STATUS - 0xc2
[ 251.056280] tegra-i2c tegra12-i2c.4: msg->len - 2
[ 251.061526] tegra-i2c tegra12-i2c.4: is_msg_write - 1
[ 251.067171] tegra-i2c tegra12-i2c.4: buf_remaining - 0
[ 251.073022] tegra-i2c tegra12-i2c.4: i2c transfer timed out, addr 0x0040, data 0x01
[ 251.081644] Failed to set dvfs regulator vdd_core
[ 252.097205] tegra-i2c tegra12-i2c.4: — register dump for debugging ----
[ 252.110497] tegra-i2c tegra12-i2c.4: I2C_CNFG - 0x2c00
[ 252.121952] tegra-i2c tegra12-i2c.4: I2C_PACKET_TRANSFER_STATUS - 0x1ff0001
[ 252.135169] tegra-i2c tegra12-i2c.4: I2C_FIFO_CONTROL - 0xe0
[ 252.146922] tegra-i2c tegra12-i2c.4: I2C_FIFO_STATUS - 0x800080
[ 252.158690] tegra-i2c tegra12-i2c.4: I2C_INT_MASK - 0xec
[ 252.169765] tegra-i2c tegra12-i2c.4: I2C_INT_STATUS - 0xc2
[ 252.180970] tegra-i2c tegra12-i2c.4: msg->len - 2
[ 252.191329] tegra-i2c tegra12-i2c.4: is_msg_write - 1
[ 252.202066] tegra-i2c tegra12-i2c.4: buf_remaining - 0
[ 252.212864] tegra-i2c tegra12-i2c.4: i2c transfer timed out, addr 0x0040, data 0x01
[ 252.232179] Failed to set dvfs regulator vdd_core
[ 253.224223] tegra-i2c tegra12-i2c.4: — register dump for debugging ----
[ 253.237505] tegra-i2c tegra12-i2c.4: I2C_CNFG - 0x2c00
[ 253.248759] tegra-i2c tegra12-i2c.4: I2C_PACKET_TRANSFER_STATUS - 0x1ff0001
[ 253.262080] tegra-i2c tegra12-i2c.4: I2C_FIFO_CONTROL - 0xe0
[ 253.273676] tegra-i2c tegra12-i2c.4: I2C_FIFO_STATUS - 0x800080
[ 253.285467] tegra-i2c tegra12-i2c.4: I2C_INT_MASK - 0xec
[ 253.296705] tegra-i2c tegra12-i2c.4: I2C_INT_STATUS - 0xc2
[ 253.307891] tegra-i2c tegra12-i2c.4: msg->len - 2
[ 253.318363] tegra-i2c tegra12-i2c.4: is_msg_write - 1
[ 253.329131] tegra-i2c tegra12-i2c.4: buf_remaining - 0
[ 253.339975] tegra-i2c tegra12-i2c.4: i2c transfer timed out, addr 0x0040, data 0x01
[ 253.359407] Failed to set dvfs regulator vdd_core
[ 253.370263] Failed to set regulator vdd_core for clock sdmmc4 to 1150 mV
[ 253.383631] sdhci-tegra sdhci-tegra.3: clock enable is failed, ret: -110

Any idea ?

Power consumption is always in question. Unless the HUB is a powered HUB with its own power source, you can’t really be sure of what is going on (especially with 4 devices). Can you try with a powered HUB?

We are using a powered USB hub ( by a lab power supply 5V 3A), and this configuration works nicely on x86 and IMX6 arm CPU.

It is realy is issue from the ehci driver, and may be other software part.

FYI, R21.4 is out and may solve some problems, it would probably be worthwhile to flash to this and test again.

Is the USB port being used the full-size port or the micro port?

Can you describe the devices connected to the HUB? I’m primarily interested in knowing whether they are common human interface devices (“HID”: keyboard, mouse, etc.) or something else. USB itself is a hotplug layer, and calls other drivers once required drivers are recognized. If they are HID, then the code is very well tested and should work correctly for basics; there have been failures known for “add on” features of common HID devices, e.g., a bluetooth keyboard may work as a keyboard but then fail because of bugs in the bluetooth connect/disconnect. It would be good to narrow down what part of USB fails.

We use the full size port
USB device are smatphone, exposing RNDIS Ethernet link over USB , using rndis_host .
Futher investigation on our side, showed that this particular driver, with it use of workqueue, is not stable ( system freeze).
Since this driver is stable on other ARM platform and on x86, we suspect the tehra xhci driver to be buggy…

I’m assuming this is the latest R21.4 release, verify if this is correct.

Unfortunately, this is one of those cases where having the actual device is needed for any reliable testing. What comes to mind is removing everything you can from the HUB except for the smartphone, e.g., remove mouse, put the keyboard on only via serial port and ignore X11, etc. If you have a micro-A USB cable (supplied micro cable is micro-B) you can use the other USB port with a HUB.

Is there anything unusual about this install, e.g., is it a normal eMMC install, or instead running off of SD card or SATA, etc? After a freeze, has the file system been checked? One suggestion is add a SATA entry to /boot/extlinux/extlinux.conf so you can boot to an alternate rootfs and do any kind of repair necessary after a freeze.

How consistently can you re-create this issue, especially when other USB devices are removed from that particular HUB? Does this device require USB3 (I see you’ve enabled USB3, otherwise tegra-xhci would not show up)? If the device can function under USB2, are you able to trigger this as USB2?

Is there an OOPS message on serial console? Does sysrq respond (e.g., alt-sysrq-s would show “Emergency Sync Complete”…just make sure it is a keyboard plugged directly into Jetson)? If on serial console you run “tail -f /var/log/syslog” you would be able to see an OOPS stack backtrace message purposely generated via sysrq of “alt-sysrq-l”. In case of freeze, it is better to shut down via sysrq first “alt-sysrq-s” (sync), “alt-sysrq-r” (remount read-only), then “alt-sysrq-b” (immediate reboot).

Package “read-edid” can be added to make available some monitor-related i2c query. Since the errors show i2c it may still be that the problem isn’t USB, but instead interaction with i2c. The command “get-edid | parse-edid” is designed for checking the i2c data from a monitor’s DDC/EDID channel, but it queries all i2c bus while doing so. Just as a poor man’s i2c stress test, this could be run in a loop:

#/bin/bash

while (( 1 == 1 )); do
  get-edid | parse-edid 1>/dev/null
done

…this could be run on several ssh console logins at the same time, trying to trigger failure while carefully avoiding the smartphone functions which previously triggered (I’m not sure exactly which smartphone activity triggered the failure).

The bottom line is that the information available does not yet show whether the final USB message is a side-effect of other failures or a symptom of USB causing a failure. Without an actual device to test the activity causing the failure needs further isolation.

One more question is how the drivers for the RNDIS are supplied…if there are modules involved the modules could be temporarily blacklisted to not load, and the other USB functions of connecting to and querying the smartphone would still work…being able/unable to trigger failure without RNDIS would be useful knowledge.

There are also some more or less mysterious odd failures which occur if a kernel module is loaded when crossing certain memory boundaries…testing with the RNDIS as integrated/non-kernel-module would be very useful. If the problem simply goes away when the related drivers are integrated instead of loaded as modules then it is just a limitation of the way modules work (see mutually exclusive kernel configs CONFIG_TASK_SIZE_3G_LESS_16M and CONFIG_TASK_SIZE_3G_LESS_24M…default max module combined size is 16MB on ARMv7). This would be fortunate because there would be no bugs and the solution would just be to integrate drivers into the zImage.