Due to limited HDD space, I have my Docker folder running on a USB storage device, connected via the provided USB-C adapter cable.
I follow standard procedure by editing /etc/docker/daemon.json
to read:
{
"data-root": "/usb-drive/hello/docker/containers",
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
}
}
When I try to pull a very large container (sudo docker pull nvcr.io/nvidia/tritonserver:20.12-py3
), I run for around 5 mins, and then my ssh
session dies. I reconnect, and find the device has restarted.
Looking at my /var/sys/log
, you can see the time jump where the restart happens, but no clear error I can see:
Feb 5 16:35:20 xavier-woolfe avahi-daemon[5429]: Joining mDNS multicast group on interface docker0.IPv4 with address 172.17.0.1.
Feb 5 16:35:20 xavier-woolfe NetworkManager[5390]: <info> [1612542920.6862] device (docker0): state change: unmanaged -> unavailable (reason 'connection-assumed', sys-iface-state: 'external
')
Feb 5 16:35:20 xavier-woolfe avahi-daemon[5429]: New relevant interface docker0.IPv4 for mDNS.
Feb 5 16:35:20 xavier-woolfe avahi-daemon[5429]: Registering new address record for 172.17.0.1 on docker0.IPv4.
Feb 5 16:35:20 xavier-woolfe kernel: [ 234.662830] IPv6: ADDRCONF(NETDEV_UP): docker0: link is not ready
Feb 5 16:35:20 xavier-woolfe NetworkManager[5390]: <info> [1612542920.6962] keyfile: add connection in-memory (d2a0639e-624d-4e4e-9b98-ed4b96c4f75c,"docker0")
Feb 5 16:35:20 xavier-woolfe NetworkManager[5390]: <info> [1612542920.6984] device (docker0): state change: unavailable -> disconnected (reason 'connection-assumed', sys-iface-state: 'exter
nal')
Feb 5 16:35:20 xavier-woolfe NetworkManager[5390]: <info> [1612542920.7017] device (docker0): Activation: starting connection 'docker0' (d2a0639e-624d-4e4e-9b98-ed4b96c4f75c)
Feb 5 16:35:20 xavier-woolfe NetworkManager[5390]: <info> [1612542920.7087] device (docker0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'external')
Feb 5 16:35:20 xavier-woolfe NetworkManager[5390]: <info> [1612542920.7099] device (docker0): state change: prepare -> config (reason 'none', sys-iface-state: 'external')
Feb 5 16:35:20 xavier-woolfe NetworkManager[5390]: <info> [1612542920.7106] device (docker0): state change: config -> ip-config (reason 'none', sys-iface-state: 'external')
Feb 5 16:35:20 xavier-woolfe NetworkManager[5390]: <info> [1612542920.7109] device (docker0): state change: ip-config -> ip-check (reason 'none', sys-iface-state: 'external')
Feb 5 16:35:20 xavier-woolfe NetworkManager[5390]: <info> [1612542920.7133] device (docker0): state change: ip-check -> secondaries (reason 'none', sys-iface-state: 'external')
Feb 5 16:35:20 xavier-woolfe NetworkManager[5390]: <info> [1612542920.7140] device (docker0): state change: secondaries -> activated (reason 'none', sys-iface-state: 'external')
Feb 5 16:35:20 xavier-woolfe NetworkManager[5390]: <info> [1612542920.7242] device (docker0): Activation: successful, device activated.
Feb 5 16:35:20 xavier-woolfe dbus-daemon[5267]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service' requested by '
:1.7' (uid=0 pid=5390 comm="/usr/sbin/NetworkManager --no-daemon ")
Feb 5 16:35:20 xavier-woolfe systemd[1]: Starting Network Manager Script Dispatcher Service...
Feb 5 16:35:20 xavier-woolfe dbus-daemon[5267]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Feb 5 16:35:20 xavier-woolfe systemd[1]: Started Network Manager Script Dispatcher Service.
Feb 5 16:35:20 xavier-woolfe nm-dispatcher: req:1 'up' [docker0]: new request (1 scripts)
Feb 5 16:35:20 xavier-woolfe nm-dispatcher: req:1 'up' [docker0]: start running ordered scripts...
Feb 5 16:35:20 xavier-woolfe systemd-resolved[5036]: Server returned error NXDOMAIN, mitigating potential DNS violation DVE-2018-0001, retrying transaction with reduced feature level UDP.
Feb 5 16:35:20 xavier-woolfe systemd[1]: Reloading OpenBSD Secure Shell server.
Feb 5 16:35:20 xavier-woolfe systemd[1]: Reloaded OpenBSD Secure Shell server.
Feb 5 16:35:24 xavier-woolfe dockerd[8046]: time="2021-02-05T16:35:24.578517064Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip c
an be used to set a preferred IP address"
Feb 5 16:35:24 xavier-woolfe dockerd[8046]: time="2021-02-05T16:35:24.731070275Z" level=info msg="Loading containers: done."
Feb 5 16:35:24 xavier-woolfe dockerd[8046]: time="2021-02-05T16:35:24.830384340Z" level=info msg="Docker daemon" commit=369ce74a3c graphdriver(s)=overlay2 version=19.03.6
Feb 5 16:35:24 xavier-woolfe dockerd[8046]: time="2021-02-05T16:35:24.832249536Z" level=info msg="Daemon has completed initialization"
Feb 5 16:35:26 xavier-woolfe dockerd[8046]: time="2021-02-05T16:35:26.867422238Z" level=info msg="API listen on /var/run/docker.sock"
Feb 5 16:35:26 xavier-woolfe systemd[1]: Started Docker Application Container Engine.
Feb 5 16:36:52 xavier-woolfe gnome-shell[7450]: pushModal: invocation of begin_modal failed
Feb 5 16:36:52 xavier-woolfe gnome-shell[7450]: pushModal: invocation of begin_modal failed
Feb 5 16:36:52 xavier-woolfe gnome-shell[7450]: error: Unable to lock: Lock was blocked by an application
Feb 5 16:40:09 xavier-woolfe systemd-modules-load[2456]: Inserted module 'bluedroid_pm'
Feb 5 16:40:09 xavier-woolfe systemd-modules-load[2456]: Module 'nvhost_vi' is builtin
Feb 5 16:40:09 xavier-woolfe systemd-modules-load[2456]: Inserted module 'nvgpu'
Feb 5 16:40:09 xavier-woolfe systemd-modules-load[2456]: Inserted module 'userspace_alert'
Feb 5 16:40:09 xavier-woolfe systemd-udevd[3155]: Network interface NamePolicy= disabled on kernel command line, ignoring.
Feb 5 16:40:09 xavier-woolfe systemd[1]: Starting Flush Journal to Persistent Storage...
Feb 5 16:40:09 xavier-woolfe systemd[1]: Mounted Kernel Configuration File System.
Any advice or troubleshooting? This reliably happens under these circumstances.