Installation of TAO and Docker container on Windows machine

I am trying to install TAO on a windows machine in order to do some benchmarks.
We are a small NSF funded company who has a way of accelerating transfer learning and want to demonstrate it in several network models supported by TAO.
We had some great success with TensorFlow but are struggling with the TAO toolkit.
It installed fine in python, (eg tao info --verbose gives the right output), but we are not able to get the docker container environment to work (tao detectnet_v2 gives errors: “Docker CLI hasn’t been logged in to a registry.”).
It seems all of the instructions are for linux.
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker
Being able to install on a variety of machines will help grow the community to include those who may be great with algorithms, but less so with installations.

We are running two Windows 10 machines each with multicore CPU’s and GPU’s (Titan X & 1660 Ti).
Thank you!

For Windows machine, please install WSL. Then run TAO under WSL environment.

Thank you Morganh.
Does that mean all of the previous python installations, user environments and everything needs to be reinstalled? This is a blank virtual machine where transfering existing code, drives and data becomes an installation issue?

Including reinstalling drivers etc?

TAO training is expected in Ubuntu system. See TAO Toolkit Quick Start Guide — TAO Toolkit 3.22.05 documentation.

So, for Windows machine, it is needed to install Linux on Windows. You can install WSL(windows subsystem Linux).

TAO can run on WSL. Below are some relevant topics.

TLT with WSL2 possible? Segmentation Fault Error - #3 by rajiv-singh
TLT 3.0 & WSL2 issues - #9
WSL2 & TAO issues - #23 by joshH

BTW, firstly, please make sure below CUDA applications work well when you follow this wsl-user-guide (CUDA on WSL :: CUDA Toolkit Documentation

Thank you for your help. I am not good at installation unless it is very specific, and from what I understood the docker containers were meant to simplify, but it seems to be doing the opposite.
I have docker running on windows running ubuntu. Thein I am also supposed to install docker in ubuntu?

Docker-CE on Ubuntu can be setup using Docker’s official convenience script:

$ curl https://get.docker.com | sh \ && sudo systemctl --now enable docker
???

I am completely lost in this process. Nvidia drivers are installed on the original machine and work fine.

For your current environment, you mention that you already trigger a ubuntu docker and run inside it, right? Can you run “$ docker info” in this docker?

In Windows, I still suggest you to install WSL firstly or something others such as Linux virtual machine.

From outside the container (windows):

docker info
Client:
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc., v0.9.1)
compose: Docker Compose (Docker Inc., v2.10.2)
extension: Manages Docker extensions (Docker Inc., v0.2.9)
sbom: View the packaged-based Software Bill Of Materials (SBOM) for an image (Anchore Inc., 0.6.0)
scan: Docker Scan (Docker Inc., v0.19.0)

Server:
Containers: 4
Running: 0
Paused: 0
Stopped: 4
Images: 1
Server Version: 20.10.17
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9cd3357b7fd7218e4aec3eae239db1f68a5a6ec6
runc version: v1.1.4-0-g5fd4c4d
init version: de40ad0
Security Options:
seccomp
Profile: default
Kernel Version: 5.10.16.3-microsoft-standard-WSL2
Operating System: Docker Desktop
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 15.46GiB
Name: docker-desktop
ID: 6QR7:JDZU:VOKR:WBNB:F4J4:JCVR:Z266:M2MY:7EYM:QV37:MIDM:RBAQ
Docker Root Dir: /var/lib/docker
Debug Mode: false
HTTP Proxy: http.docker.internal:3128
HTTPS Proxy: http.docker.internal:3128
No Proxy: hubproxy.docker.internal
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
hubproxy.docker.internal:5000
127.0.0.0/8
Live Restore Enabled: false

WARNING: No blkio throttle.read_bps_device support
WARNING: No blkio throttle.write_bps_device support
WARNING: No blkio throttle.read_iops_device support
WARNING: No blkio throttle.write_iops_device support

From inside the container: which is based on WSL
root@56634ca0b771:/# docker info
Client:
Context: default
Debug Mode: false
Plugins:
app: Docker App (Docker Inc., v0.9.1-beta3)
buildx: Docker Buildx (Docker Inc., v0.9.1-docker)
compose: Docker Compose (Docker Inc., v2.10.2)
scan: Docker Scan (Docker Inc., v0.17.0)

Server:
ERROR: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
errors pretty printing info

Refer to WSL - Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running? - General - Docker Community Forums

Every time you start the WSL distro you have to start the docker service:

sudo service docker start

It seems like there is a big hole for windows users and is becoming a nightmare installation. I had two weeks to run the examples and demonstrate the benchmarks and it seems like at least a week will be burned on installation issues.
In the WSL ubuntu trying to run as in the directions and I get the warning:
Please get Docker Desktop from Docker Desktop - Docker
it doesnt run in WSL and I get errors:
failed to start daemon: Error initializing network controller: error obtaining controller instance: failed to create NAT chain DOCKER: iptables failed: iptables -t nat -N DOCKER: iptables v1.8.4 (legacy): can’t initialize iptables table `nat’: Table does not exist (do you need to insmod?)
Perhaps iptables or your kernel needs to be upgraded.
(exit status 3)
Thanks

Actually it is not related to TAO.

Seems that you get stuck at Windows + WSL + docker.

Suggest you to run with other docker images to check if there is any similar issue.

Thanks for your help. It is TAO and its associated installation instructions that are instructing Windows + WSL + docker which doesn’t seem compatible.
Are there other configurations that may work? Does Nvidia have any other docker containers with TAO or even better TAO Docker containers for windows?
Do I have to do a virtual machine?

I just have a quick check on my Windows + WSL + tao docker(TAO Toolkit for Computer Vision | NVIDIA NGC).

It can login the docker. Below are two ways.

Method 1: Use tao docker directly.
morganh@NV-GWHK4B3:/mnt/c/Windows/system32$ sudo service docker start
morganh@NV-GWHK4B3:/mnt/c/Windows/system32$ docker login nvcr.io
morganh@NV-GWHK4B3:/mnt/c/Windows/system32$ docker run --runtime=nvidia -it --entrypoint=‘’ --rm nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3 /bin/bash

Method 2: Use tao-launcher
morganh@NV-GWHK4B3:/mnt/c/Windows/system32$ sudo service docker start
morganh@NV-GWHK4B3:/mnt/c/Windows/system32$ docker login nvcr.io
morganh@NV-GWHK4B3:/mnt/c/Windows/system32$ pip3 install nvidia-tao
morganh@NV-GWHK4B3:/mnt/c/Windows/system32$ tao
morganh@NV-GWHK4B3:/mnt/c/Windows/system32$ tao ssd

Instructions for each step became complicated for windows, after trying your commands, searching and bumbling around like a fly randomly bumping into things I finally got it working sort-of.

I seem to be running tlt, it warns that I need to update to tao, it says it updates but is still tlt. When I run an ipynb like bpnet I get stuck in weird ways.

For example from the script running
!tao bpnet dataset_convert -m ‘test’ -o $DATA_DIR/val --generate_masks --dataset_spec $DATA_POSE_SPECS_DIR/coco_spec.json
works and creates val-fold-000-of-001

but running the very similar
!tao bpnet dataset_convert -m ‘train’ -o $DATA_DIR/train --generate_masks --dataset_spec $DATA_POSE_SPECS_DIR/coco_spec.json
runs similarly, does not generate an error, but does not create train-fold-000-of-001
that is needed for the notebook.

Thanks,

Please create a new topic for the latest question. More, please try other notebook as well.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.