Deepops issue with rootless docker in slurm

Main error:
Cannot connect to the Docker daemon at unix:///var/tmp/xdg_runtime_dir_1000/docker.sock. Is the docker daemon running?

Steps to reproduce:

(env) arnoldas@XCITEMAIN:~/deepops$ ansible-playbook -K --limit slurm-cluster playbooks/container/docker-rootless.yml
Installation completes successfully without any errors

Info:
(env) arnoldas@dgx-a100:/sw/software/rootless-docker/bin$ /sw/software/rootless-docker/bin/docker -v
Docker version 20.10.17, build 100c701
(env) arnoldas@dgx-a100:/sw/software/rootless-docker/bin$ docker -v
Docker version 20.10.17, build 100c701

Test with slurm:
(env) arnoldas@XCITEMAIN$ srun --ntasks=1 --gpus-per-task=1 --cpus-per-task=5 --gres-flags=enforce-binding --pty bash
(env) arnoldas@dgx-a100$ module load rootless-docker
(env) arnoldas@dgx-a100$ start_rootless_docker.sh

Error: Cannot connect to the Docker daemon at unix:///var/tmp/xdg_runtime_dir_1000/docker.sock. Is the docker daemon running?

Logs:

  • [ -w /var/tmp/xdg_runtime_dir_1000 ]
  • [ -d /home/arnoldas ]
  • rootlesskit=
  • command -v docker-rootlesskit
  • command -v rootlesskit
  • rootlesskit=rootlesskit
  • break
  • [ -z rootlesskit ]
  • :
  • :
  • : builtin
  • : auto
  • : auto
  • net=
  • mtu=
  • [ -z ]
  • command -v slirp4netns
  • [ -z ]
  • command -v vpnkit
  • net=vpnkit
  • [ -z ]
  • mtu=1500
  • [ -z ]
  • _DOCKERD_ROOTLESS_CHILD=1
  • export _DOCKERD_ROOTLESS_CHILD
  • id -u
  • [ 1000 = 0 ]
  • command -v selinuxenabled
  • exec rootlesskit --net=vpnkit --mtu=1500 --slirp4netns-sandbox=auto --slirp4netns-seccomp=auto --disable-host-loopback --port-driver=builtin --copy-up=/etc --copy-up=/run --propagation=rslave /sw/software/rootless-docker/bin/dockerd-rootless.sh --experimental --data-root=/var/tmp/docker-container-storage-1000 --storage-driver overlay2
    Cannot connect to the Docker daemon at unix:///var/tmp/xdg_runtime_dir_1000/docker.sock. Is the docker daemon running?
    Cannot connect to the Docker daemon at unix:///var/tmp/xdg_runtime_dir_1000/docker.sock. Is the docker daemon running?
    Cannot connect to the Docker daemon at unix:///var/tmp/xdg_runtime_dir_1000/docker.sock. Is the docker daemon running?
    Cannot connect to the Docker daemon at unix:///var/tmp/xdg_runtime_dir_1000/docker.sock. Is the docker daemon running?
    Cannot connect to the Docker daemon at unix:///var/tmp/xdg_runtime_dir_1000/docker.sock. Is the docker daemon running?
    Cannot connect to the Docker daemon at unix:///var/tmp/xdg_runtime_dir_1000/docker.sock. Is the docker daemon running?
    Cannot connect to the Docker daemon at unix:///var/tmp/xdg_runtime_dir_1000/docker.sock. Is the docker daemon running?
    Cannot connect to the Docker daemon at unix:///var/tmp/xdg_runtime_dir_1000/docker.sock. Is the docker daemon running?
    Cannot connect to the Docker daemon at unix:///var/tmp/xdg_runtime_dir_1000/docker.sock. Is the docker daemon running?
    Cannot connect to the Docker daemon at unix:///var/tmp/xdg_runtime_dir_1000/docker.sock. Is the docker daemon running?
  • [ -w /var/tmp/xdg_runtime_dir_1000 ]
  • [ -d /home/arnoldas ]
  • rootlesskit=
  • command -v docker-rootlesskit
  • command -v rootlesskit
  • rootlesskit=rootlesskit
  • break
  • [ -z rootlesskit ]
  • :
  • :
  • : builtin
  • : auto
  • : auto
  • net=
  • mtu=
  • [ -z ]
  • command -v slirp4netns
  • [ -z ]
  • command -v vpnkit
  • net=vpnkit
  • [ -z ]
  • mtu=1500
  • [ -z 1 ]
  • [ 1 = 1 ]
  • rm -f /run/docker /run/containerd /run/xtables.lock
  • [ -n ]
  • stat -c %T -f /etc
  • [ tmpfs = tmpfs ]
  • [ -L /etc/ssl ]
  • realpath /etc/ssl
  • realpath_etc_ssl=/etc/.ro3581486471/ssl
  • rm -f /etc/ssl
  • mkdir /etc/ssl
  • mount --rbind /etc/.ro3581486471/ssl /etc/ssl
  • exec dockerd --experimental --data-root=/var/tmp/docker-container-storage-1000 --storage-driver overlay2
    Cannot connect to the Docker daemon at unix:///var/tmp/xdg_runtime_dir_1000/docker.sock. Is the docker daemon running?
    Cannot connect to the Docker daemon at unix:///var/tmp/xdg_runtime_dir_1000/docker.sock. Is the docker daemon running?
    Cannot connect to the Docker daemon at unix:///var/tmp/xdg_runtime_dir_1000/docker.sock. Is the docker daemon running?
    Cannot connect to the Docker daemon at unix:///var/tmp/xdg_runtime_dir_1000/docker.sock. Is the docker daemon running?
    Cannot connect to the Docker daemon at unix:///var/tmp/xdg_runtime_dir_1000/docker.sock. Is the docker daemon running?
    Cannot connect to the Docker daemon at unix:///var/tmp/xdg_runtime_dir_1000/docker.sock. Is the docker daemon running?
    Cannot connect to the Docker daemon at unix:///var/tmp/xdg_runtime_dir_1000/docker.sock. Is the docker daemon running?
    Cannot connect to the Docker daemon at unix:///var/tmp/xdg_runtime_dir_1000/docker.sock. Is the docker daemon running?
    Cannot connect to the Docker daemon at unix:///var/tmp/xdg_runtime_dir_1000/docker.sock. Is the docker daemon running?
    Cannot connect to the Docker daemon at unix:///var/tmp/xdg_runtime_dir_1000/docker.sock. Is the docker daemon running?
    Cannot connect to the Docker daemon at unix:///var/tmp/xdg_runtime_dir_1000/docker.sock. Is the docker daemon running?
    Cannot connect to the Docker daemon at unix:///var/tmp/xdg_runtime_dir_1000/docker.sock. Is the docker daemon running?
    Cannot connect to the Docker daemon at unix:///var/tmp/xdg_runtime_dir_1000/docker.sock. Is the docker daemon running?
    Cannot connect to the Docker daemon at unix:///var/tmp/xdg_runtime_dir_1000/docker.sock. Is the docker daemon running?
    Cannot connect to the Docker daemon at unix:///var/tmp/xdg_runtime_dir_1000/docker.sock. Is the docker daemon running?
    Cannot connect to the Docker daemon at unix:///var/tmp/xdg_runtime_dir_1000/docker.sock. Is the docker daemon running?
    Cannot connect to the Docker daemon at unix:///var/tmp/xdg_runtime_dir_1000/docker.sock. Is the docker daemon running?
    Cannot connect to the Docker daemon at unix:///var/tmp/xdg_runtime_dir_1000/docker.sock. Is the docker daemon running?
    Cannot connect to the Docker daemon at unix:///var/tmp/xdg_runtime_dir_1000/docker.sock. Is the docker daemon running?
    Cannot connect to the Docker daemon at unix:///var/tmp/xdg_runtime_dir_1000/docker.sock. Is the docker daemon running?
    Cannot connect to the Docker daemon at unix:///var/tmp/xdg_runtime_dir_1000/docker.sock. Is the docker daemon running?
    Cannot connect to the Docker daemon at unix:///var/tmp/xdg_runtime_dir_1000/docker.sock. Is the docker daemon running?
    Cannot connect to the Docker daemon at unix:///var/tmp/xdg_runtime_dir_1000/docker.sock. Is the docker daemon running?
    Cannot connect to the Docker daemon at unix:///var/tmp/xdg_runtime_dir_1000/docker.sock. Is the docker daemon running?
    Cannot connect to the Docker daemon at unix:///var/tmp/xdg_runtime_dir_1000/docker.sock. Is the docker daemon running?
    Cannot connect to the Docker daemon at unix:///var/tmp/xdg_runtime_dir_1000/docker.sock. Is the docker daemon running?
    Cannot connect to the Docker daemon at unix:///var/tmp/xdg_runtime_dir_1000/docker.sock. Is the docker daemon running?
    Cannot connect to the Docker daemon at unix:///var/tmp/xdg_runtime_dir_1000/docker.sock. Is the docker daemon running?
    Cannot connect to the Docker daemon at unix:///var/tmp/xdg_runtime_dir_1000/docker.sock. Is the docker daemon running?
    Cannot connect to the Docker daemon at unix:///var/tmp/xdg_runtime_dir_1000/docker.sock. Is the docker daemon running?
    Cannot connect to the Docker daemon at unix:///var/tmp/xdg_runtime_dir_1000/docker.sock. Is the docker daemon running?
    Cannot connect to the Docker daemon at unix:///var/tmp/xdg_runtime_dir_1000/docker.sock. Is the docker daemon running?
    Cannot connect to the Docker daemon at unix:///var/tmp/xdg_runtime_dir_1000/docker.sock. Is the docker daemon running?
    Cannot connect to the Docker daemon at unix:///var/tmp/xdg_runtime_dir_1000/docker.sock. Is the docker daemon running?
    INFO[2022-08-10T17:36:51.624672014Z] Starting up
    WARN[2022-08-10T17:36:51.625486659Z] Running experimental build
    WARN[2022-08-10T17:36:51.625623419Z] Running in rootless mode. This mode has feature limitations.
    INFO[2022-08-10T17:36:51.625633307Z] Running with RootlessKit integration
    Cannot connect to the Docker daemon at unix:///var/tmp/xdg_runtime_dir_1000/docker.sock. Is the docker daemon running?
    Cannot connect to the Docker daemon at unix:///var/tmp/xdg_runtime_dir_1000/docker.sock. Is the docker daemon running?
    Cannot connect to the Docker daemon at unix:///var/tmp/xdg_runtime_dir_1000/docker.sock. Is the docker daemon running?
    WARN[2022-08-10T17:36:51.699017168Z] could not change group /var/tmp/xdg_runtime_dir_1000/docker.sock to docker: group docker not found
    Cannot connect to the Docker daemon at unix:///var/tmp/xdg_runtime_dir_1000/docker.sock. Is the docker daemon running?
    INFO[2022-08-10T17:36:51.797419227Z] libcontainerd: started new containerd process pid=3991494
    INFO[2022-08-10T17:36:51.836517816Z] parsed scheme: “unix” module=grpc
    INFO[2022-08-10T17:36:51.853392273Z] scheme “unix” not registered, fallback to default scheme module=grpc
    INFO[2022-08-10T17:36:51.888458235Z] ccResolverWrapper: sending update to cc: {[{unix:///var/tmp/xdg_runtime_dir_1000/docker/containerd/containerd.sock 0 }] } module=grpc
    INFO[2022-08-10T17:36:51.905319738Z] ClientConn switching balancer to “pick_first” module=grpc
    WARN[2022-08-10T17:36:53.048102580Z] grpc: addrConn.createTransport failed to connect to {unix:///var/tmp/xdg_runtime_dir_1000/docker/containerd/containerd.sock 0 }. Err :connection error: desc = “transport: error while dialing: dial unix:///var/tmp/xdg_runtime_dir_1000/docker/containerd/containerd.sock: timeout”. Reconnecting… module=grpc
    WARN[2022-08-10T17:36:57.018288576Z] grpc: addrConn.createTransport failed to connect to {unix:///var/tmp/xdg_runtime_dir_1000/docker/containerd/containerd.sock 0 }. Err :connection error: desc = “transport: error while dialing: dial unix:///var/tmp/xdg_runtime_dir_1000/docker/containerd/containerd.sock: timeout”. Reconnecting… module=grpc
    /sw/software/rootless-docker/bin/start_rootless_docker.sh: line 44: 3991485 Killed docker ps > /dev/null
    /sw/software/rootless-docker/bin/start_rootless_docker.sh: line 44: 3991628 Killed docker ps > /dev/null
    /sw/software/rootless-docker/bin/start_rootless_docker.sh: line 44: 3991637 Killed docker ps > /dev/null
    /sw/software/rootless-docker/bin/start_rootless_docker.sh: line 44: 3991646 Killed docker ps > /dev/null
    /sw/software/rootless-docker/bin/start_rootless_docker.sh: line 44: 3991657 Killed docker ps > /dev/null
    /sw/software/rootless-docker/bin/start_rootless_docker.sh: line 44: 3991666 Killed docker ps > /dev/null
    WARN[2022-08-10T17:37:03.783396035Z] grpc: addrConn.createTransport failed to connect to {unix:///var/tmp/xdg_runtime_dir_1000/docker/containerd/containerd.sock 0 }. Err :connection error: desc = “transport: error while dialing: dial unix:///var/tmp/xdg_runtime_dir_1000/docker/containerd/containerd.sock: timeout”. Reconnecting… module=grpc

Can you ask this on deepops github?

I have over a week ago with no response. I figured I might have better luck here.

Ah, yes. I see it in deepops now. Sorry, it’s good you asked here to bring my attention to it. I’ll reply on github.

Thank you!