AGX Xavier reboots when under USB drive load

Due to limited HDD space, I have my Docker folder running on a USB storage device, connected via the provided USB-C adapter cable.

I follow standard procedure by editing /etc/docker/daemon.json to read:

{
    "data-root": "/usb-drive/hello/docker/containers",
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

When I try to pull a very large container (sudo docker pull nvcr.io/nvidia/tritonserver:20.12-py3), I run for around 5 mins, and then my ssh session dies. I reconnect, and find the device has restarted.

Looking at my /var/sys/log, you can see the time jump where the restart happens, but no clear error I can see:

Feb  5 16:35:20 xavier-woolfe avahi-daemon[5429]: Joining mDNS multicast group on interface docker0.IPv4 with address 172.17.0.1.
Feb  5 16:35:20 xavier-woolfe NetworkManager[5390]: <info>  [1612542920.6862] device (docker0): state change: unmanaged -> unavailable (reason 'connection-assumed', sys-iface-state: 'external
')
Feb  5 16:35:20 xavier-woolfe avahi-daemon[5429]: New relevant interface docker0.IPv4 for mDNS.
Feb  5 16:35:20 xavier-woolfe avahi-daemon[5429]: Registering new address record for 172.17.0.1 on docker0.IPv4.
Feb  5 16:35:20 xavier-woolfe kernel: [  234.662830] IPv6: ADDRCONF(NETDEV_UP): docker0: link is not ready
Feb  5 16:35:20 xavier-woolfe NetworkManager[5390]: <info>  [1612542920.6962] keyfile: add connection in-memory (d2a0639e-624d-4e4e-9b98-ed4b96c4f75c,"docker0")
Feb  5 16:35:20 xavier-woolfe NetworkManager[5390]: <info>  [1612542920.6984] device (docker0): state change: unavailable -> disconnected (reason 'connection-assumed', sys-iface-state: 'exter
nal')
Feb  5 16:35:20 xavier-woolfe NetworkManager[5390]: <info>  [1612542920.7017] device (docker0): Activation: starting connection 'docker0' (d2a0639e-624d-4e4e-9b98-ed4b96c4f75c)
Feb  5 16:35:20 xavier-woolfe NetworkManager[5390]: <info>  [1612542920.7087] device (docker0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'external')
Feb  5 16:35:20 xavier-woolfe NetworkManager[5390]: <info>  [1612542920.7099] device (docker0): state change: prepare -> config (reason 'none', sys-iface-state: 'external')
Feb  5 16:35:20 xavier-woolfe NetworkManager[5390]: <info>  [1612542920.7106] device (docker0): state change: config -> ip-config (reason 'none', sys-iface-state: 'external')
Feb  5 16:35:20 xavier-woolfe NetworkManager[5390]: <info>  [1612542920.7109] device (docker0): state change: ip-config -> ip-check (reason 'none', sys-iface-state: 'external')
Feb  5 16:35:20 xavier-woolfe NetworkManager[5390]: <info>  [1612542920.7133] device (docker0): state change: ip-check -> secondaries (reason 'none', sys-iface-state: 'external')
Feb  5 16:35:20 xavier-woolfe NetworkManager[5390]: <info>  [1612542920.7140] device (docker0): state change: secondaries -> activated (reason 'none', sys-iface-state: 'external')
Feb  5 16:35:20 xavier-woolfe NetworkManager[5390]: <info>  [1612542920.7242] device (docker0): Activation: successful, device activated.
Feb  5 16:35:20 xavier-woolfe dbus-daemon[5267]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service' requested by '
:1.7' (uid=0 pid=5390 comm="/usr/sbin/NetworkManager --no-daemon ")
Feb  5 16:35:20 xavier-woolfe systemd[1]: Starting Network Manager Script Dispatcher Service...
Feb  5 16:35:20 xavier-woolfe dbus-daemon[5267]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Feb  5 16:35:20 xavier-woolfe systemd[1]: Started Network Manager Script Dispatcher Service.
Feb  5 16:35:20 xavier-woolfe nm-dispatcher: req:1 'up' [docker0]: new request (1 scripts)
Feb  5 16:35:20 xavier-woolfe nm-dispatcher: req:1 'up' [docker0]: start running ordered scripts...
Feb  5 16:35:20 xavier-woolfe systemd-resolved[5036]: Server returned error NXDOMAIN, mitigating potential DNS violation DVE-2018-0001, retrying transaction with reduced feature level UDP.
Feb  5 16:35:20 xavier-woolfe systemd[1]: Reloading OpenBSD Secure Shell server.
Feb  5 16:35:20 xavier-woolfe systemd[1]: Reloaded OpenBSD Secure Shell server.
Feb  5 16:35:24 xavier-woolfe dockerd[8046]: time="2021-02-05T16:35:24.578517064Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip c
an be used to set a preferred IP address"
Feb  5 16:35:24 xavier-woolfe dockerd[8046]: time="2021-02-05T16:35:24.731070275Z" level=info msg="Loading containers: done."
Feb  5 16:35:24 xavier-woolfe dockerd[8046]: time="2021-02-05T16:35:24.830384340Z" level=info msg="Docker daemon" commit=369ce74a3c graphdriver(s)=overlay2 version=19.03.6
Feb  5 16:35:24 xavier-woolfe dockerd[8046]: time="2021-02-05T16:35:24.832249536Z" level=info msg="Daemon has completed initialization"
Feb  5 16:35:26 xavier-woolfe dockerd[8046]: time="2021-02-05T16:35:26.867422238Z" level=info msg="API listen on /var/run/docker.sock"
Feb  5 16:35:26 xavier-woolfe systemd[1]: Started Docker Application Container Engine.
Feb  5 16:36:52 xavier-woolfe gnome-shell[7450]: pushModal: invocation of begin_modal failed
Feb  5 16:36:52 xavier-woolfe gnome-shell[7450]: pushModal: invocation of begin_modal failed
Feb  5 16:36:52 xavier-woolfe gnome-shell[7450]: error: Unable to lock: Lock was blocked by an application
Feb  5 16:40:09 xavier-woolfe systemd-modules-load[2456]: Inserted module 'bluedroid_pm'
Feb  5 16:40:09 xavier-woolfe systemd-modules-load[2456]: Module 'nvhost_vi' is builtin
Feb  5 16:40:09 xavier-woolfe systemd-modules-load[2456]: Inserted module 'nvgpu'
Feb  5 16:40:09 xavier-woolfe systemd-modules-load[2456]: Inserted module 'userspace_alert'
Feb  5 16:40:09 xavier-woolfe systemd-udevd[3155]: Network interface NamePolicy= disabled on kernel command line, ignoring.
Feb  5 16:40:09 xavier-woolfe systemd[1]: Starting Flush Journal to Persistent Storage...
Feb  5 16:40:09 xavier-woolfe systemd[1]: Mounted Kernel Configuration File System.

Any advice or troubleshooting? This reliably happens under these circumstances.

Sorry for the late response, still an issue to support?

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

Are you able to dump the log from serial console when error happens?

https://elinux.org/Jetson/General_debug