Docker - No such container

Trying to get tao toolkit up and running using this tutorial and having issues with the Docker.

!tao model yolo_v4 dataset_convert -d $SPECS_DIR/yolo_v4_tfrecords_kitti_train.txt \
                             -o $DATA_DOWNLOAD_DIR/yolo_v4/tfrecords/train \
                             -r $USER_EXPERIMENT_DIR/
                             #--gpus 1 --debug/

2025-01-28 22:58:19,933 [TAO Toolkit] [INFO] root 160: Registry: ['nvcr.io']
2025-01-28 22:58:20,025 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 360: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf1.15.5
2025-01-28 22:58:20,064 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 301: Printing tty value True

What's next:
    Try Docker Debug for seamless, persistent debugging tools in any container or image → docker debug f228ac51bcb901c9206c4772e25830c541d1bd3329c1f33c18dc8c0e13acbb4d
    Learn more at https://docs.docker.com/go/debug-cli/
Error response from daemon: No such container: f228ac51bcb901c9206c4772e25830c541d1bd3329c1f33c18dc8c0e13acbb4d
2025-01-28 22:58:22,421 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 363: Stopping container.
tao info
Configuration of the TAO Toolkit Instance
task_group: ['model', 'dataset', 'deploy']
format_version: 3.0
toolkit_version: 5.5.0
published_date: 08/26/2024


docker login nvcr.io
.......
Login Succeeded

Maybe the following will give a hint for the reason?

docker run --rm --gpus all ubuntu nvidia-smi
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].

now with sudo

sudo docker run --rm --gpus all ubuntu nvidia-smi      
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01             Driver Version: 535.183.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 4060 ...    Off | 00000000:01:00.0 Off |                  N/A |
| N/A   47C    P4              10W /  55W |      8MiB /  8188MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
+---------------------------------------------------------------------------------------+

Some system info:

$ cat /etc/docker/daemon.json
{
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "args": [],
            "path": "/usr/bin/nvidia-container-runtime"
        }
    }
}

$ cat ~/.docker/config.json 
{
	"auths": {
		"nvcr.io": {
			"auth": "************************************************************8"
		}
	},
	"credStore": "desktop",
	"currentContext": "desktop-linux",
	"plugins": {
		"debug": {
			"hooks": "exec"
		},
		"scout": {
			"hooks": "pull,buildx build"
		}
	},
	"features": {
		"hooks": "true"
	}
}

Also, every restart the credStore somehow became credsStore, which prevents me from the docker login to nvcr.io, unless i change it.

2 Likes

Please install nvidia-docker2.
You can search the error in the TAO forum to get more hints.

As of Docker release 19.03, NVIDIA GPUs are natively supported as devices in the Docker runtime. This means that the special runtime provided by nvidia-docker2 is no longer necessary.
https://docs.nvidia.com/deeplearning/frameworks/user-guide/index.html

here container-toolkit i also don’t see anything about nvidia-docker2

$ docker --version
Docker version 27.5.1, build 9f9e405

Anyway I have it installed. Nothing changed.

Then I went installing rootless mode.
docker run --rm --gpus all ubuntu nvidia-smi
Unable to find image ‘ubuntu:latest’ locally
latest: Pulling from library/ubuntu
de44b265507a: Pull complete
Digest: sha256:80dd3c3b9c6cecb9f1667e9290b3bc61b78c2678c02cbdae5f0fea92cc6734ab
Status: Downloaded newer image for ubuntu:latest
Wed Jan 29 17:21:23 2025
±--------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01 Driver Version: 535.183.01 CUDA Version: 12.2 |
|-----------------------------------------±---------------------±---------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 4060 … Off | 00000000:01:00.0 On | N/A |
| N/A 39C P8 2W / 55W | 49MiB / 8188MiB | 6% Default |
| | | N/A |
±----------------------------------------±---------------------±---------------------+

±--------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
±--------------------------------------------------------------------------------------+

Getting different error now:

$ tao model detectnet_v2
2025-01-29 19:22:37,074 [TAO Toolkit] [INFO] root 160: Registry: [‘nvcr.io’]
Traceback (most recent call last):
File “/home/alon/.local/lib/python3.10/site-packages/urllib3/connectionpool.py”, line 716, in urlopen
httplib_response = self._make_request(
File “/home/alon/.local/lib/python3.10/site-packages/urllib3/connectionpool.py”, line 416, in _make_request
conn.request(method, url, **httplib_request_kw)
File “/usr/lib/python3.10/http/client.py”, line 1283, in request
self._send_request(method, url, body, headers, encode_chunked)
File “/usr/lib/python3.10/http/client.py”, line 1329, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File “/usr/lib/python3.10/http/client.py”, line 1278, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File “/usr/lib/python3.10/http/client.py”, line 1038, in _send_output
self.send(msg)
File “/usr/lib/python3.10/http/client.py”, line 976, in send
self.connect()
File “/home/alon/.local/lib/python3.10/site-packages/docker/transport/unixconn.py”, line 43, in connect
sock.connect(self.unix_socket)
FileNotFoundError: [Errno 2] No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/home/alon/.local/lib/python3.10/site-packages/requests/adapters.py”, line 486, in send
resp = conn.urlopen(
File “/home/alon/.local/lib/python3.10/site-packages/urllib3/connectionpool.py”, line 802, in urlopen
retries = retries.increment(
File “/home/alon/.local/lib/python3.10/site-packages/urllib3/util/retry.py”, line 552, in increment
raise six.reraise(type(error), error, _stacktrace)
File “/home/alon/.local/lib/python3.10/site-packages/urllib3/packages/six.py”, line 769, in reraise
raise value.with_traceback(tb)
File “/home/alon/.local/lib/python3.10/site-packages/urllib3/connectionpool.py”, line 716, in urlopen
httplib_response = self._make_request(
File “/home/alon/.local/lib/python3.10/site-packages/urllib3/connectionpool.py”, line 416, in _make_request
conn.request(method, url, **httplib_request_kw)
File “/usr/lib/python3.10/http/client.py”, line 1283, in request
self._send_request(method, url, body, headers, encode_chunked)
File “/usr/lib/python3.10/http/client.py”, line 1329, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File “/usr/lib/python3.10/http/client.py”, line 1278, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File “/usr/lib/python3.10/http/client.py”, line 1038, in _send_output
self.send(msg)
File “/usr/lib/python3.10/http/client.py”, line 976, in send
self.connect()
File “/home/alon/.local/lib/python3.10/site-packages/docker/transport/unixconn.py”, line 43, in connect
sock.connect(self.unix_socket)
urllib3.exceptions.ProtocolError: (‘Connection aborted.’, FileNotFoundError(2, ‘No such file or directory’))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/home/alon/.local/lib/python3.10/site-packages/docker/api/client.py”, line 205, in _retrieve_server_version
return self.version(api_version=False)[“ApiVersion”]
File “/home/alon/.local/lib/python3.10/site-packages/docker/api/daemon.py”, line 181, in version
return self._result(self._get(url), json=True)
File “/home/alon/.local/lib/python3.10/site-packages/docker/utils/decorators.py”, line 46, in inner
return f(self, *args, **kwargs)
File “/home/alon/.local/lib/python3.10/site-packages/docker/api/client.py”, line 228, in _get
return self.get(url, **self._set_request_timeout(kwargs))
File “/home/alon/.local/lib/python3.10/site-packages/requests/sessions.py”, line 602, in get
return self.request(“GET”, url, **kwargs)
File “/home/alon/.local/lib/python3.10/site-packages/requests/sessions.py”, line 589, in request
resp = self.send(prep, **send_kwargs)
File “/home/alon/.local/lib/python3.10/site-packages/requests/sessions.py”, line 703, in send
r = adapter.send(request, **kwargs)
File “/home/alon/.local/lib/python3.10/site-packages/requests/adapters.py”, line 501, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: (‘Connection aborted.’, FileNotFoundError(2, ‘No such file or directory’))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/home/alon/.local/bin/tao”, line 8, in
sys.exit(main())
File “/home/alon/.local/lib/python3.10/site-packages/nvidia_tao_cli/entrypoint/tao_launcher.py”, line 134, in main
instance.launch_command(
File “/home/alon/.local/lib/python3.10/site-packages/nvidia_tao_cli/components/instance_handler/local_instance.py”, line 357, in launch_command
docker_handler = self.handler_map[
File “/home/alon/.local/lib/python3.10/site-packages/nvidia_tao_cli/components/instance_handler/local_instance.py”, line 203, in handler_map
handler_map[handler_key] = DockerHandler(
File “/home/alon/.local/lib/python3.10/site-packages/nvidia_tao_cli/components/docker_handler/docker_handler.py”, line 92, in init
self._docker_client = docker.from_env()
File “/home/alon/.local/lib/python3.10/site-packages/docker/client.py”, line 84, in from_env
return cls(
File “/home/alon/.local/lib/python3.10/site-packages/docker/client.py”, line 40, in init
self.api = APIClient(*args, **kwargs)
File “/home/alon/.local/lib/python3.10/site-packages/docker/api/client.py”, line 188, in init
self._version = self._retrieve_server_version()
File “/home/alon/.local/lib/python3.10/site-packages/docker/api/client.py”, line 212, in _retrieve_server_version
raise DockerException(
docker.errors.DockerException: Error while fetching server API version: (‘Connection aborted.’, FileNotFoundError(2, ‘No such file or directory’))

Are you triggering tlt docker based on one docker?
In this case, please
add -v /var/run/docker.sock:/var/run/docker.sock

See Tlt augment not working

I’m running it from the host with tao

$ docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
Mon Feb 3 10:41:33 2025
±--------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01 Driver Version: 535.183.01 CUDA Version: 12.2 |
|-----------------------------------------±---------------------±---------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 4060 … Off | 00000000:01:00.0 On | N/A |
| N/A 38C P8 1W / 55W | 49MiB / 8188MiB | 0% Default |
| | | N/A |
±----------------------------------------±---------------------±---------------------+

±--------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
±--------------------------------------------------------------------------------------+

Please try to run directly from the host instead of running inside the docker.