We’re seeing some occasional WiFi disconnects on our TX2 boards (emphasis mine):
[12271.511539] <b>dhd_bus_rxctl: resumed on timeout, INT status=0x20800040</b>
[12271.518373] <b>dhd_bus_rxctl: rxcnt_timeout=1, rxlen=0</b>
[12271.523257] <b>dhd_check_hang: Event HANG send up due to re=1 te=0 e=-110 s=2</b>
[12271.528946] serial-tegra 3110000.serial: RxData DMA copy to tty layer failed
[12271.537268] dhd_prot_ioctl : bus is down. we have nothing to do
[12271.543196] dhd_prot_ioctl : bus is down. we have nothing to do
[12271.543203] dhd_check_hang: Event HANG send up due to re=1 te=0 e=-110 s=2
[12271.556079] CFG80211-ERROR) wl_cfg80211_get_station :
[12271.556079] dhd_prot_ioctl : bus is down. we have nothing to do
[12271.567127] NOT assoc, error -1
[12271.570283] CFG80211-ERROR) wl_cfg80211_disconnect : Reason 3
[12271.576041] dhd_prot_ioctl : bus is down. we have nothing to do
[12271.581970] CFG80211-ERROR) wl_cfg80211_disconnect : error (-1)
[12271.709131] serial-tegra 3110000.serial: RxData DMA copy to tty layer failed
[12271.737499] dhd_prot_ioctl : bus is down. we have nothing to do
[12271.743422] CFGP2P-ERROR) wl_cfgp2p_bss_isup : 'cfg bss -C 0' failed: -1
[12271.750131] CFGP2P-ERROR) wl_cfgp2p_bss_isup : NOTE: this ioctl error is normal when the BSS has not been created yet.
[12271.760833] dhd_prot_ioctl : bus is down. we have nothing to do
[12271.766750] CFG80211-ERROR) wl_notifier_change_state : wlan0:error(-1)
[12271.773285] dhd_prot_ioctl : bus is down. we have nothing to do
[12271.779233] dhd_prot_ioctl : bus is down. we have nothing to do
[12271.796446] CFGP2P-ERROR) wl_cfgp2p_set_management_ie : vndr ie set error : -1
[12271.803736] dhd_prot_ioctl : bus is down. we have nothing to do
[12271.809659] CFG80211-ERROR) wl_dongle_down : WLC_DOWN error (-1)
[12271.895207] serial-tegra 3110000.serial: RxData DMA copy to tty layer failed
[12271.953571] wl_android_wifi_off in
[12271.956980] tegra_sysfs_off
[12271.959774] tegra_sysfs_rf_test_disable
[12271.963615] dhd_prot_ioctl : bus is down. we have nothing to do
[12271.969532] dhd_prot_ioctl : bus is down. we have nothing to do
[12271.975465] dhd_wl_ioctl_get_intiovar: get int iovar ampdu_hostreorder failed, ERR -1
[12271.995329] dhd_prot_ioctl : bus is down. we have nothing to do
[12272.001260] dhd_wl_ioctl_set_intiovar: set int iovar tlv failed, ERR -1
[12272.007949] Disabling wake69
[12272.008072] sdhci-tegra 3440000.sdhci: Tuning already done, restoring the best tap value : 20
[12272.021466] wifi_platform_set_power = 0
[12272.080282] serial-tegra 3110000.serial: RxData DMA copy to tty layer failed
[12272.225938] <b>CFG80211-ERROR) wl_cfg80211_hang : In : chip crash eventing</b>
[12272.246622] cfg80211: World regulatory domain updated:
[12272.251768] cfg80211: DFS Master region: unset
[12272.256129] cfg80211: (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp), (dfs_cac_time)
[12272.265347] serial-tegra 3110000.serial: RxData DMA copy to tty layer failed
[12272.272922] cfg80211: (2402000 KHz - 2472000 KHz @ 40000 KHz), (N/A, 2000 mBm), (N/A)
[12272.280928] cfg80211: (2457000 KHz - 2482000 KHz @ 20000 KHz, 92000 KHz AUTO), (N/A, 2000 mBm), (N/A)
[12272.290327] cfg80211: (2474000 KHz - 2494000 KHz @ 20000 KHz), (N/A, 2000 mBm), (N/A)
[12272.298344] cfg80211: (5170000 KHz - 5250000 KHz @ 80000 KHz, 160000 KHz AUTO), (N/A, 2000 mBm), (N/A)
[12272.307830] cfg80211: (5250000 KHz - 5330000 KHz @ 80000 KHz, 160000 KHz AUTO), (N/A, 2000 mBm), (0 s)
[12272.317312] cfg80211: (5490000 KHz - 5730000 KHz @ 160000 KHz), (N/A, 2000 mBm), (0 s)
[12272.325404] cfg80211: (5735000 KHz - 5835000 KHz @ 80000 KHz), (N/A, 2000 mBm), (N/A)
[12272.333408] cfg80211: (57240000 KHz - 63720000 KHz @ 2160000 KHz), (N/A, 0 mBm), (N/A)
We’re running a 4.4 kernel, with the RT-patch applied, and a few extra patches also, see the end of the post. Firmware version for the WiFi is:
Firmware version = wl0: Dec 12 2017 15:09:35 version 7.35.221.34 (r679642) FWID 01-e35dbe99
We’ve seen the exact above issue on a TX2 a single time, but have seen a couple of other cases where the symptoms have been the same, but without having logs available.
Some notes:
- The issue happened after the TX2 had been on for roughly 3.5 hours. No WiFi issues before that, and we've seen our TX2's with the same software running for much longer without issues.
- It was stationary when the issue happened, but had been moving around a few meters prior to that.
- It was connected to a single 5GHz Access Point, so should have no possibility of roaming.
- As can be seen, we were simultaneously seeing ``` serial-tegra 3110000.serial: RxData DMA copy to tty layer failed ``` errors, I'm assuming due to one of our applications having gone into an error state, and no longer servicing the serial port that it's otherwise consuming data from. I'm not sure if this might affect the WiFi subsystem?
- After the error, I could log in over the serial console, and issue an ifdown wlan0 / ifup wlan0, which caused the WiFi to come back up.
Any ideas what the issue could be? - or any tips on how to debug further? We sadly don’t yet have a surefire way of reproducing the issue, but are working on it currently.
I can see also that we’re not using the newest firmware, but I haven’t been able to find anywhere to download that, nor any changelog?
Any help would be appreciated!
The patches we use that relate to WiFi are the following:
From https://devtalk.nvidia.com/default/topic/1047138/jetson-tx1/wifi-disconnect-problem-on-jetpack-3-3/2
------------------ drivers/net/wireless/bcmdhd/wl_cfg80211.c ------------------
index 9d3568d18421..8f5f11d28968 100644
@@ -9935,6 +9935,7 @@ wl_cfg80211_verify_bss(struct bcm_cfg80211 *cfg, struct net_device *ndev)
do {
bss = CFG80211_GET_BSS(wiphy, NULL, curbssid,
ssid->SSID, ssid->SSID_len);
+ cfg->wdev->ssid_len = ssid->SSID_len;
if (bss || (count > 5)) {
break;
}
From https://devtalk.nvidia.com/default/topic/1047319/jetson-tx2/disable-wifi-powersave
------------------- drivers/net/wireless/bcmdhd/dhd_linux.c -------------------
index a1aa56926ceb..89c3334660e9 100644
@@ -6154,6 +6154,7 @@ dhd_preinit_ioctls(dhd_pub_t *dhd)
#endif
}
+ dhd_slpauto_config(dhd, 0);
DHD_ERROR(("Firmware up: op_mode=0x%04x, MAC="MACDBG"\n",
dhd->op_mode, MAC2STRDBG(dhd->mac.octet)));
/* Set Country code */
---------------------------- net/wireless/nl80211.c ----------------------------
index bf65f31bd55e..868eec3d8da4 100644
@@ -8659,8 +8659,15 @@ static int nl80211_set_power_save(struct sk_buff *skb, struct genl_info *info)
state = (ps_state == NL80211_PS_ENABLED) ? true : false;
+/* This check has been commented out, to ignore the internally saved
+ power management state, and just always send the on or off command.
+ There seems to be something that can turn on power saving without it
+ being reflected in the internal state, so removing this allows to keep
+ periodically sending power_save off commands (using the userspace iw
+ utility), without turning it on inbetween.
if (state == wdev->ps)
- return 0;
+ return 0;*/
err = rdev_set_power_mgmt(rdev, dev, state, wdev->ps_timeout);
if (!err)