This may be a trivial question, but how do i configure the watchdog timeout on the AGX Dev Kit ?
I have a remote dev kit running the latest version of jetpack deep in the woods tracking deer, there are times where the machine runs out of memory and becomes unreachable. I usually have to travel all the way into the woods to go push the reset button, which can be annoying.
Current watchdog function default enable and the time is 120s while the system hang it will reset the system, however if OOM that would reset the system.
Are there any configuration options for the watchdog ? The default settings are not resetting my jetson once it becomes unresponsive.
Side Question: Does overcommiting memory affect the watchdog ? I currently have a 16gb swap file, and i overcommit the memory. I’ve noticed the system becomes unresponsive if both the RAM and swap file become full.
Okay, so i can confirm the problem is with either the nvme SSD or the swap file.
I have a second Jetson NX that i just tested. It does not have the SSD, nor a swap file and it was able to reboot without going into emergency mode.
I have the same SSD on my NX and AGX
It is a SP M.2 PCIe Gen 3 SSD. Model: A80 512GB
Update:
I flashed a spare AGX to the newest version of jetpack, added a samsung nvme SSD, then used the command.
echo c>/proc/sysrq-trigger
The result is emergency mode.
Oddly, on all three machines i can reboot from the command line, without issues. So it has something to do with the watchdog restarting the system, while a nvme ssd is mounted.
write an application which ping watchdog every 100 sec
cat /proc/watchdog
Once you do that, now its userspace responsibility to keep reloading watchdog counter every 100 sec.
If userspace is hanged, system will automatically reset.