Nvidia TAO tookit on AWS vm

Download and run the test samples

Now that you have created a virtualenv and installed all the dependencies, you are now ready to download and run the TAO samples on the notebook. The instructions below assume that you are running the TAO Computer Vision samples.

  1. Download and unzip the notebooks from NGC using the commands below:
wget --content-disposition https://api.ngc.nvidia.com/v2/resources/nvidia/tao/tao-getting-started/versions/5.0.0/zip -O tao-getting-started_5.0.0.zip
unzip -u tao-getting-started_5.0.0.zip  -d ./tao-getting-started_5.0.0 && cd ./tao-getting-started_5.0.0
  1. Launch the jupyter notebook using the command below:
jupyter notebook --ip 0.0.0.0 --port 8888 --allow-root --NotebookApp.token=<notebook_token>

This will kick off the jupyter notebook server in the VM. To access this server, navigate to http://<dns_name>:8888/ and, when prompted, enter the <notebook_token> used to start the notebook server. The dns_name here is the Public IPv4 DNS of the VM that you will see under the EC2 dashboard of your respective instance.

I am running iton AWS vm,
I want to train the yolo_v4 model
where can I found the <notebook_token>

Please try to find the token in the log when you run jupyter notebook --ip 0.0.0.0 --port 8888 --allow-root .

How to use my own dataset instead of kitti dataset,
which part of the code I should change ?

It does not need to modify code. Just need to prepare images and labels. The labels format are mentioned in Data Annotation Format - NVIDIA Docs as well.

after running :

If you use your own dataset, you will need to run the code below to generate the best anchor shape

!tao model yolo_v4 kmeans -l $DATA_DOWNLOAD_DIR/kitti_split/training/label
-i $DATA_DOWNLOAD_DIR/kitti_split/training/image
-n 9
-x 1248
-y 384

The anchor shape generated by this script is sorted. Write the first 3 into small_anchor_shape in the config

file. Write middle 3 into mid_anchor_shape. Write last 3 into big_anchor_shape.

getting error :
2024-05-29 08:03:11,990 [TAO Toolkit] [INFO] root 160: Registry: [‘nvcr.io’]
Traceback (most recent call last):
File “/home/ubuntu/.virtualenvs/launcher/lib/python3.10/site-packages/requests/adapters.py”, line 555, in send
conn = self.get_connection_with_tls_context(
File “/home/ubuntu/.virtualenvs/launcher/lib/python3.10/site-packages/requests/adapters.py”, line 411, in get_connection_with_tls_context
conn = self.poolmanager.connection_from_host(
File “/home/ubuntu/.virtualenvs/launcher/lib/python3.10/site-packages/urllib3/poolmanager.py”, line 246, in connection_from_host
return self.connection_from_context(request_context)
File “/home/ubuntu/.virtualenvs/launcher/lib/python3.10/site-packages/urllib3/poolmanager.py”, line 258, in connection_from_context
raise URLSchemeUnknown(scheme)
urllib3.exceptions.URLSchemeUnknown: Not supported URL scheme http+docker

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/home/ubuntu/.virtualenvs/launcher/lib/python3.10/site-packages/docker/api/client.py”, line 205, in _retrieve_server_version
return self.version(api_version=False)[“ApiVersion”]
File “/home/ubuntu/.virtualenvs/launcher/lib/python3.10/site-packages/docker/api/daemon.py”, line 181, in version
return self._result(self._get(url), json=True)
File “/home/ubuntu/.virtualenvs/launcher/lib/python3.10/site-packages/docker/utils/decorators.py”, line 46, in inner
return f(self, *args, **kwargs)
File “/home/ubuntu/.virtualenvs/launcher/lib/python3.10/site-packages/docker/api/client.py”, line 228, in _get
return self.get(url, **self._set_request_timeout(kwargs))
File “/home/ubuntu/.virtualenvs/launcher/lib/python3.10/site-packages/requests/sessions.py”, line 602, in get
return self.request(“GET”, url, **kwargs)
File “/home/ubuntu/.virtualenvs/launcher/lib/python3.10/site-packages/requests/sessions.py”, line 589, in request
resp = self.send(prep, **send_kwargs)
File “/home/ubuntu/.virtualenvs/launcher/lib/python3.10/site-packages/requests/sessions.py”, line 703, in send
r = adapter.send(request, **kwargs)
File “/home/ubuntu/.virtualenvs/launcher/lib/python3.10/site-packages/requests/adapters.py”, line 559, in send
raise InvalidURL(e, request=request)
requests.exceptions.InvalidURL: Not supported URL scheme http+docker

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/home/ubuntu/.virtualenvs/launcher/bin/tao”, line 8, in
sys.exit(main())
File “/home/ubuntu/.virtualenvs/launcher/lib/python3.10/site-packages/nvidia_tao_cli/entrypoint/tao_launcher.py”, line 134, in main
instance.launch_command(
File “/home/ubuntu/.virtualenvs/launcher/lib/python3.10/site-packages/nvidia_tao_cli/components/instance_handler/local_instance.py”, line 357, in launch_command
docker_handler = self.handler_map[
File “/home/ubuntu/.virtualenvs/launcher/lib/python3.10/site-packages/nvidia_tao_cli/components/instance_handler/local_instance.py”, line 203, in handler_map
handler_map[handler_key] = DockerHandler(
File “/home/ubuntu/.virtualenvs/launcher/lib/python3.10/site-packages/nvidia_tao_cli/components/docker_handler/docker_handler.py”, line 92, in init
self._docker_client = docker.from_env()
File “/home/ubuntu/.virtualenvs/launcher/lib/python3.10/site-packages/docker/client.py”, line 84, in from_env
return cls(
File “/home/ubuntu/.virtualenvs/launcher/lib/python3.10/site-packages/docker/client.py”, line 40, in init
self.api = APIClient(*args, **kwargs)
File “/home/ubuntu/.virtualenvs/launcher/lib/python3.10/site-packages/docker/api/client.py”, line 188, in init
self._version = self._retrieve_server_version()
File “/home/ubuntu/.virtualenvs/launcher/lib/python3.10/site-packages/docker/api/client.py”, line 212, in _retrieve_server_version
raise DockerException(
docker.errors.DockerException: Error while fetching server API version: Not supported URL scheme http+docker

Did you login to the NGC docker registry ( nvcr.io ) ?
$ docker login nvcr.io

  1. Username: “$oauthtoken”
  2. Password: “YOUR_NGC_API_KEY”

yes
image

How about
! tao info --help

I am using an AWS vm to access the yolo_v4 notebook,
I have already installed the below requirements in the VM :

Installing the Pre-Requisites for TAO Toolkit in the VM

The NVIDIA Deep Learning AMI by default comes with several dependencies pre-installed to launch

NVIDIA-built Deep Learning Containers. To run TAO Toolkit, you are required to install some simple dependencies.

  1. Install prerequisite apt packages:
sudo apt update
sudo apt install python-pip python3-pip unzip
pip3 install --upgrade pip
  1. Install virtualenv wrapper:
pip3 install virtualenvwrapper
  1. Configure the virtualenv wrapper:
export VIRTUALENVWRAPPER_PYTHON=/usr/bin/python3
export WORKON_HOME=/home/ubuntu/.virtualenvs
export PATH=/home/ubuntu/.local/bin:$PATH
source /home/ubuntu/.local/bin/virtualenvwrapper.sh

Note

You may also add these commands to the /home/ubuntu/.bashrc file of the VM so that the configuration persists for multiple sessions.

  1. Create a virtualenv for the launcher using the following command
mkvirtualenv -p /usr/bin/python3 launcher

Note

You only need to create a virtualenv once in the instance. When you restart the instance, simply run the commands in step 3 and invoke the same virtual env using the command below:

workon launcher
  1. Install jupyterlab in the virtualenv using the command below:
pip3 install jupyterlab
  1. Log in to the NGC docker registry named nvcr.io:
docker login nvcr.io

The username here is $oauthtoken and the password is the NGC API KEY.You may set this API key from the NGC website.

Do I need to install the following again inside the jupyter notebook :

How about
! tao info --help

This is of no help :

image

Sorry, could you run
! tao info --verbose

after
! tao info --verbose

Configuration of the TAO Toolkit Instance

task_group:
model:
dockers:
nvidia/tao/tao-toolkit:
5.0.0-tf2.11.0:
docker_registry: nvcr.io
tasks:
1. classification_tf2
2. efficientdet_tf2
5.0.0-tf1.15.5:
docker_registry: nvcr.io
tasks:
1. bpnet
2. classification_tf1
3. converter
4. detectnet_v2
5. dssd
6. efficientdet_tf1
7. faster_rcnn
8. fpenet
9. lprnet
10. mask_rcnn
11. multitask_classification
12. retinanet
13. ssd
14. unet
15. yolo_v3
16. yolo_v4
17. yolo_v4_tiny
5.3.0-pyt:
docker_registry: nvcr.io
tasks:
1. action_recognition
2. centerpose
3. deformable_detr
4. dino
5. mal
6. ml_recog
7. ocdnet
8. ocrnet
9. optical_inspection
10. pointpillars
11. pose_classification
12. re_identification
13. visual_changenet
14. classification_pyt
15. segformer
dataset:
dockers:
nvidia/tao/tao-toolkit:
5.3.0-data-services:
docker_registry: nvcr.io
tasks:
1. augmentation
2. auto_label
3. annotations
4. analytics
deploy:
dockers:
nvidia/tao/tao-toolkit:
5.3.0-deploy:
docker_registry: nvcr.io
tasks:
1. visual_changenet
2. centerpose
3. classification_pyt
4. classification_tf1
5. classification_tf2
6. deformable_detr
7. detectnet_v2
8. dino
9. dssd
10. efficientdet_tf1
11. efficientdet_tf2
12. faster_rcnn
13. lprnet
14. mask_rcnn
15. ml_recog
16. multitask_classification
17. ocdnet
18. ocrnet
19. optical_inspection
20. retinanet
21. segformer
22. ssd
23. trtexec
24. unet
25. yolo_v3
26. yolo_v4
27. yolo_v4_tiny
format_version: 3.0
toolkit_version: 5.3.0
published_date: 03/14/2024

OK, to narrow down, suggest you to open a terminal instead of notebook, and then run inside the docker.
$tao model yolo_v4 run /bin/bash

Then run something inside the docker.

When run inside the docker, it is not needed to set “tao model” in the beginning of command line.

But my aim is to train yolo_v4 model with my own dataset, how can I achieve that ?
do i have to train it on the jupyter notebook or I can do that via terminal ?

Both can work.
Inside the docker, just run the similar command without “tao model” in the beginning.

And for original issue, please take a look at the sharing in another topic Tao toolkit observations - #63 by foreverneilyoung.

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.