TX2 stops answering pings, after running for about a day

My TX2 sometimes stops answering pings or ssh connections after it has been running for a day or so.

The machine still is able to make outgoing use the network , if I open a terminal window on the machine itself, I can ping other sites and reach them via http, etc. But when I try to ssh to the host, I get a timeout, and a ping returns

Request timeout for icmp_seq 43
ping: sendto: No route to host
Request timeout for icmp_seq 44
ping: sendto: Host is down

Is there some kind of incoming sleep mode that the wlan0 interface goes into? This seems to happen after the machine has been up for a day or more, but does not always happen.

On both the Jetson and host PC, what do you see from the commands “route” and “ifconfig”? When this occurs, do you get any error from the command “host nvidia.com” (again, once from host PC and once from Jetson)?

Basically a “no route to host” is different than “connection refused”. When this issue is going on, and you run ping, are you using a dotted-decimal IP format, or a named format? E.g., 192.168.55.1 is dotted-decimal format, and “nvidia.com” is a named format. If you ping a dotted-decimal address and this works, but ping of a named address fails, then it implies the host PC itself is failing its DNS lookup. If the dotted-decimal address is failing, then any router between is probably failing. How is networking wired, e.g., actual wired, WiFi, a switch and a router, just a switch, so on?

I am pinging from a macbook on the same Wifi LAN, using a numeric IP address, 192.168.86.98. TX2 is using WiFi network interface.

I will try your suggestions to run the ifconfig and route commands next time it happens.

This morning, when I typed directly to the TX2 “ping google.com” it worked, and then
I could ssh into the TX2 from my Macbook. So it seems as if that ‘woke’ something in the TX2 networking stack.

That implies to me that maybe there is some ‘sleep’ mode on the Wifi interface that is woken up by outgoing packets or something?

One more thing, I have noticed that when I ssh into the TX2, the interactive response to echoing characters is sluggish. I Google’d that and it was suggested to do this

/sbin/iw dev wlan0 set power_save off

And that does indeed make it more responsive. I don’t know if that has anything to do with the issue I am seeing, but it does suggest some kind of power saving mode kicking in after some time period, even though I set power_save off?

I tend to avoid WiFi, and am not very good at debugging wireless, but I strongly suspect you are correct about sleep. There have been many posts on these forums related to sleep and WiFi (in combination). Someone else will need to answer the question of power save and WiFi interactions.

I had a patch for disabling the sleep mode. Maybe you could try ti.

— a/drivers/net/wireless/bcmdhd/dhd_linux.c
+++ b/drivers/net/wireless/bcmdhd/dhd_linux.c
@@ -6155,6 +6155,7 @@ dhd_preinit_ioctls(dhd_pub_t *dhd)
endif
}

  •   dhd_slpauto_config(dhd, 0);
    DHD_ERROR(("Firmware up: op_mode=0x%04x, MAC="MACDBG"\n",
            dhd->op_mode, MAC2STRDBG(dhd->mac.octet)));
    /* Set Country code  */
    

Also, I think you could firstly clarify if this is really an issue from wifi by trying the wired network case too.

Thanks WayneWWW. I believe it is a problem with Wifi, as plugging into ethernet lets me ssh in.
But this is an intermittent problem, the board has been up for 6 days now without becoming unreachable.

If I were to apply your patch, where would I find instructions for rebuilding/installing that driver? I’ve modified a simple Linux device driver before and rebuilt kernel, but it has been many years…

We provide a guide in L4T development guide on our download center.
You need to download the kernel source from that website too.

Please be careful that modifying/updating kernel may cause the system hang and not able to boot up if there is anything wrong (installation/corrupted…).