New computer install GPU Docker error

Please provide the following information when requesting support.

• Hardware RTX4090 Laptop version
• Network Type unet
• TLT Version tlt: command not found

Followed this guide to install all of TAO requirements

Downloaded TAO5 and installed all requirements.

Selected unet as a first test

In the unet notebook, every cell runs well, until the train cell:

!tao model unet train --gpus $NUM_GPUS \
                      --gpu_index $GPU_INDEX \
                      -e $SPECS_DIR/unet_train_resnet_unet_isbi.txt \
                      -r $USER_EXPERIMENT_DIR/isbi_experiment_unpruned \
                      -m $USER_EXPERIMENT_DIR/pretrained_resnet18/pretrained_semantic_segmentation_vresnet18/resnet_18.hdf5 \
                      -n model_isbi

Results in error:

2023-09-11 21:05:23,052 [TAO Toolkit] [INFO] root 160: Registry: [‘nvcr.io’]
2023-09-11 21:05:23,102 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 361: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf1.15.5
2023-09-11 21:05:23,124 [TAO Toolkit] [WARNING] nvidia_tao_cli.components.docker_handler.docker_handler 267:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/home/david/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
2023-09-11 21:05:23,125 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 275: Printing tty value True
Docker instantiation failed with error: 500 Server Error: Internal Server Error (“could not select device driver “” with capabilities: [[gpu]]”)

I installed docker following the instructions here

and sudo docker run hello-world runs well

and the output of nvidia-smi is

nvidia-smi
Mon Sep 11 22:26:02 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.86.05              Driver Version: 535.86.05    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 4090 ...    Off | 00000000:01:00.0  On |                  N/A |
| N/A   43C    P8               5W / 115W |     62MiB / 16376MiB |      1%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      1846      G   /usr/lib/xorg/Xorg                           55MiB |
+---------------------------------------------------------------------------------------+

Thanks for the help

DBG

Please try below.
$ sudo apt-get install nvidia-docker2

E: Unable to locate package nvidia-docker2

So I used the procedure found at Installing Docker® and nvidia-docker2

curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install nvidia-docker2
sudo systemctl restart docker.service

And now I get

2023-09-12 13:47:45,527 [TAO Toolkit] [INFO] root 160: Registry: [‘nvcr.io’]
2023-09-12 13:47:45,587 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 361: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf1.15.5
2023-09-12 13:47:45,620 [TAO Toolkit] [WARNING] nvidia_tao_cli.components.docker_handler.docker_handler 267:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/home/david/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
2023-09-12 13:47:45,620 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 275: Printing tty value True
Error response from daemon: No such container: e610296cd235db576e6845195cd4f56aa5295a803bd590a7f94d980d7a200158
2023-09-12 13:47:46,503 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 337: Stopping container.

Running docker ps returns an empty list

Thanks!

How about
$ docker ps -a

Also empty

I uninstalled everything docker and reinstalled following the IBM guide I referenced above and the notebook is working now:

sudo apt-get update
sudo apt-get install apt-transport-https ca-certificates curl gnupg-agent software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu bionic stable"
sudo apt-get update
sudo apt-get install docker-ce
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install nvidia-docker2
sudo systemctl restart docker.service

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.