Can't run the provided TAO toolkit sample code

silentjcr · July 10, 2023, 8:38am

Hi, I found and tried the following sample code on the page: TAO toolkit Getting Started

Yet I wasn’t able to start training as the error message showed that “No module named tensorflow”.

I did check if tensorflow was installed by running !pip list and it was there.
However, after running the following part of the code in part 2.3, tensorflow just “disappeared” as I couldn’t see it on the list shown by !pip list anymore. Also, some other packages such as torch were also missing.

import os
if os.environ["GOOGLE_COLAB"] == "1":
    os.environ["bash_script"] = "setup_env.sh"
else:
    os.environ["bash_script"] = "setup_env_desktop.sh"

!sed -i "s|PATH_TO_COLAB_NOTEBOOKS|$COLAB_NOTEBOOKS_PATH|g" $COLAB_NOTEBOOKS_PATH/tensorflow/$bash_script

!sh $COLAB_NOTEBOOKS_PATH/tensorflow/$bash_script

Do you have any idea? Thanks.

Morganh · July 11, 2023, 1:54am

This is a known issue since last week. Please refer to the workaround mentioned in Running tao toolkit in google colab - #17 by Morganh . Thanks.

silentjcr · July 21, 2023, 5:25am

TAO toolkit 5.0 was released yesterday.
Today I tried the sample codes NVIDIA provided again and lots of installations got skipped.

sudo: unable to execute /usr/bin/add-apt-repository: No such file or directory
Hit:1 http://security.ubuntu.com/ubuntu jammy-security InRelease
Hit:2 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease
Hit:3 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease
Hit:4 http://archive.ubuntu.com/ubuntu jammy InRelease
Hit:5 http://archive.ubuntu.com/ubuntu jammy-updates InRelease
Hit:6 http://archive.ubuntu.com/ubuntu jammy-backports InRelease
Hit:7 https://ppa.launchpadcontent.net/c2d4u.team/c2d4u4.0+/ubuntu jammy InRelease
Hit:8 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease
Hit:9 https://ppa.launchpadcontent.net/graphics-drivers/ppa/ubuntu jammy InRelease
Hit:10 https://ppa.launchpadcontent.net/ubuntugis/ppa/ubuntu jammy InRelease
Reading package lists... Done
W: https://cloud.r-project.org/bin/linux/ubuntu/jammy-cran40/InRelease: Key is stored in legacy trusted.gpg keyring (/etc/apt/trusted.gpg), see the DEPRECATION section in apt-key(8) for details.
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Note, selecting 'libcasa-python3-6' for regex 'python3.6'
Note, selecting 'libpython3.6-stdlib' for regex 'python3.6'
Note, selecting 'python3.6-2to3' for regex 'python3.6'
libcasa-python3-6 is already the newest version (3.4.0-2build1).
0 upgraded, 0 newly installed, 0 to remove and 16 not upgraded.
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
python3-pip is already the newest version (22.0.2+dfsg-1ubuntu0.3).
0 upgraded, 0 newly installed, 0 to remove and 16 not upgraded.
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
E: Unable to locate package python3.6-distutils
E: Couldn't find any package by glob 'python3.6-distutils'
E: Couldn't find any package by regex 'python3.6-distutils'
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
E: Unable to locate package python3.6-dev
E: Couldn't find any package by glob 'python3.6-dev'
E: Couldn't find any package by regex 'python3.6-dev'
rm: cannot remove '/usr/bin/python': No such file or directory
/content/drive/MyDrive/nvidia-tao/tensorflow/setup_env.sh: 16: python3.6: not found
/content/drive/MyDrive/nvidia-tao/tensorflow/setup_env.sh: 17: python3.6: not found
/content/drive/MyDrive/nvidia-tao/tensorflow/setup_env.sh: 18: python3.6: not found
/content/drive/MyDrive/nvidia-tao/tensorflow/setup_env.sh: 21: python3.6: not found
/content/drive/MyDrive/nvidia-tao/tensorflow/setup_env.sh: 22: python3.6: not found
--2023-07-21 05:20:49--  https://github.com/Kitware/CMake/releases/download/v3.14.4/cmake-3.14.4-Linux-x86_64.sh
Resolving github.com (github.com)... 140.82.113.4
Connecting to github.com (github.com)|140.82.113.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/537699/fc11d880-7650-11e9-969f-7442127f007a?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20230721%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230721T052050Z&X-Amz-Expires=300&X-Amz-Signature=3a9860d6ded5f4e74aea8ed0a5081542a1fa9d6322064572ec50749fc8bd6dce&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=537699&response-content-disposition=attachment%3B%20filename%3Dcmake-3.14.4-Linux-x86_64.sh&response-content-type=application%2Foctet-stream [following]
--2023-07-21 05:20:50--  https://objects.githubusercontent.com/github-production-release-asset-2e65be/537699/fc11d880-7650-11e9-969f-7442127f007a?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20230721%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230721T052050Z&X-Amz-Expires=300&X-Amz-Signature=3a9860d6ded5f4e74aea8ed0a5081542a1fa9d6322064572ec50749fc8bd6dce&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=537699&response-content-disposition=attachment%3B%20filename%3Dcmake-3.14.4-Linux-x86_64.sh&response-content-type=application%2Foctet-stream
Resolving objects.githubusercontent.com (objects.githubusercontent.com)... 185.199.108.133, 185.199.110.133, 185.199.109.133, ...
Connecting to objects.githubusercontent.com (objects.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 37196929 (35M) [application/octet-stream]
Saving to: ‘cmake-3.14.4-Linux-x86_64.sh’

cmake-3.14.4-Linux- 100%[===================>]  35.47M   196MB/s    in 0.2s    

2023-07-21 05:20:50 (196 MB/s) - ‘cmake-3.14.4-Linux-x86_64.sh’ saved [37196929/37196929]

CMake Installer Version: 3.14.4, Copyright (c) Kitware
This is a self-extracting archive.
The archive will be extracted to: /usr/local

Using target directory: /usr/local
Extracting, please wait...

Unpacking finished successfully
/content/drive/MyDrive/nvidia-tao/tensorflow/setup_env.sh: 32: python3.6: not found
/content/drive/MyDrive/nvidia-tao/tensorflow/setup_env.sh: 33: python3.6: not found
/content/drive/MyDrive/nvidia-tao/tensorflow/setup_env.sh: 34: python3.6: not found
/content/drive/MyDrive/nvidia-tao/tensorflow/setup_env.sh: 37: python3.6: not found

Not sure if it has something to do with the latest update.

Morganh · July 21, 2023, 5:54am

Please wait for the official announcement. The colab https://github.com/NVIDIA-AI-IOT/nvidia-tao/tree/main/tensorflow is not the new version yet.

gowtham2 · July 21, 2023, 11:21am

Traceback (most recent call last):
File “/usr/bin/add-apt-repository”, line 363, in
addaptrepo = AddAptRepository()
File “/usr/bin/add-apt-repository”, line 41, in init
self.distro.get_sources(self.sourceslist)
File “/usr/local/lib/python3.10/dist-packages/aptsources/distro.py”, line 91, in get_sources
raise NoDistroTemplateException(
aptsources.distro.NoDistroTemplateException: Error: could not find a distribution template for Ubuntu/jammy

I got this error from today, yesterday it runs and no changes were made.

Morganh · July 21, 2023, 3:44pm

Please double check or restart the colab notebook.
There is not change in the colab https://github.com/NVIDIA-AI-IOT/nvidia-tao/tree/main/tensorflow

Morganh · July 22, 2023, 12:07pm

Can you add one cell to check if there is the error?
! sudo add-apt-repository ppa:deadsnakes/ppa

silentjcr · July 22, 2023, 12:46pm

Morganh · July 22, 2023, 2:19pm

I can also reproduce. Need to check further. It is not related to TAO.

silentjcr · July 25, 2023, 3:41am

Well I’m not sure if this issue only happens to sample codes that utilize tensorflow.
I couldn’t run both Multi-class Image Classification and Multi-task Image Classification sample codes that utilize tensorflow while I could run ActionRecognitonNet that uses PyTorch.

Morganh · July 25, 2023, 4:43am

TAO team is working on this colab issue. Will update to you if any. Thanks.

Morganh · July 26, 2023, 2:15am

The issue is caused due to an update in OS version to 22.04: add-apt-repository fails in Ubuntu 22.04 but works fine with 'fallback runtime version' · Issue #3867 · googlecolab/colabtools · GitHub
Please use below workaround.
Click on tools->Command Palette-> Use fallback runtime version

Then, below is working now.
!sudo add-apt-repository ppa:deadsnakes/ppa -y

silentjcr · July 26, 2023, 5:17pm

Worked, tho it seemed that I had to install a few more modules to make it run correctly.

castej10 · July 28, 2023, 2:50pm

would you be kind enough to help me with what otehr requirements needed to be installed? seems like im having some real issues setting up the proper environment.

Thank you kindly

Morganh · July 28, 2023, 3:09pm

@silentjcr Could you share the steps for “a few more modules” you mentioned above?
@castej10 Could you save the .ipynb as an .html file and upload here?

castej10 · July 28, 2023, 4:41pm

hello @Morganh , I realized i wasnt using the most up-to-date github repository so i deleted the old one from my google drive and re-downloaded it and that fixed many issues, but i still get some errors like

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-colab 1.0.0 requires requests~=2.21.0, but you have requests 2.27.1 which is incompatible.
google-colab 1.0.0 requires six~=1.12.0, but you have six 1.15.0 which is incompatible.

and

Installing collected packages: zipp, typing-extensions, six, ipython-genutils, decorator, traitlets, setuptools, pyrsistent, importlib-metadata, attrs, wcwidth, tornado, pyzmq, python-dateutil, pyparsing, pycparser, ptyprocess, parso, nest-asyncio, jupyter-core, jsonschema, entrypoints, webencodings, pygments, prompt-toolkit, pickleshare, pexpect, packaging, nbformat, MarkupSafe, jupyter-client, jedi, cffi, backcall, async-generator, testpath, pandocfilters, nbclient, mistune, jupyterlab-pygments, jinja2, ipython, defusedxml, dataclasses, bleach, argon2-cffi-bindings, terminado, Send2Trash, prometheus-client, numpy, nbconvert, ipykernel, argon2-cffi, urllib3, smmap, protobuf, notebook, jmespath, h5py, docutils, widgetsnbextension, termcolor, scipy, qtpy, PyYAML, platformdirs, Pillow, orderedmultidict, onnx, kiwisolver, keras-preprocessing, keras-applications, jupyterlab-widgets, idna, gitdb, cycler, chardet, certifi, botocore, uritemplate, tifffile, threadpoolctl, shortuuid, setproctitle, sentry-sdk, s3transfer, requests, qtconsole, PyWavelets, pytz, pytools, pyjwt, psutil, promise, pathtools, pathlib2, onnxconverter-common, networkx, matplotlib, mako, llvmlite, keras, jupyter-console, joblib, ipywidgets, imageio, GitPython, future, furl, flatbuffers, fire, docker-pycreds, cython, Click, appdirs, xmltodict, wandb, uplink, uff, tqdm, toposort, tf2onnx, tabulate, simplejson, shapely, semver, seaborn, scikit-learn, scikit-image, retrying, requests-toolbelt, recordclass, pycuda, pycocotools-fix, pyarrow, prettytable, posix-ipc, pandas, opencv-python, onnxruntime, onnx-graphsurgeon, nvidia-ml-py, numba, mpi4py, keras2onnx, keras-metrics, jupyter, grpcio, graphsurgeon, DLLogger, cryptography, clearml, boto3, argparse, argcomplete, addict
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
nvidia-tensorflow 1.15.4+nv20.10 requires numpy<1.19.0,>=1.16.0, but you have numpy 1.19.4 which is incompatible.
nvidia-tao 4.0.0 requires idna==2.10, but you have idna 2.7 which is incompatible.
nvidia-tao 4.0.0 requires six==1.15.0, but you have six 1.13.0 which is incompatible.
nvidia-tao 4.0.0 requires tabulate==0.8.7, but you have tabulate 0.7.5 which is incompatible.
nvidia-tao 4.0.0 requires urllib3>=1.26.5, but you have urllib3 1.24.3 which is incompatible.
google-colab 1.0.0 requires ipykernel~=4.6.0, but you have ipykernel 5.5.6 which is incompatible.
google-colab 1.0.0 requires ipython~=5.5.0, but you have ipython 7.16.3 which is incompatible.
google-colab 1.0.0 requires notebook~=5.2.0, but you have notebook 6.4.10 which is incompatible.
google-colab 1.0.0 requires pandas~=0.24.0, but you have pandas 0.25.3 which is incompatible.
google-colab 1.0.0 requires requests~=2.21.0, but you have requests 2.20.1 which is incompatible.
google-colab 1.0.0 requires six~=1.12.0, but you have six 1.13.0 which is incompatible.
google-colab 1.0.0 requires tornado~=4.5.0, but you have tornado 6.1 which is incompatible.

here is the .html file

ssd.html (838.3 KB)

silentjcr · July 31, 2023, 1:32am

Sorry for the late reply. Wasn’t able to view the content when I was off work.

I actually modified something in setup_env.sh by removing most of the ‘python 3.6’ term in front of all the “pip install” parts.

#!/bin/sh

# Install Python 3.6 as the default version
sudo add-apt-repository ppa:deadsnakes/ppa -y
sudo apt-get update
sudo apt-get install python3.6 -y
apt install python3-pip -y
apt-get install python3.6-distutils
apt-get install python3.6-dev

# Set Python 3.6 as the default version
rm /usr/bin/python
ln -sf /usr/bin/python3.6 /usr/bin/python3
ln -sf /usr/bin/python3.6 /usr/local/bin/python

pip install --upgrade pip
pip install google-colab
pip install nvidia-pyindex
pip install cython==0.27.3

# Install Tensorflow
pip install https://developer.download.nvidia.com/compute/redist/nvidia-horovod/nvidia_horovod-0.20.0+nv20.10-cp36-cp36m-linux_x86_64.whl
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist nvidia-tensorflow==1.15.4+nv20.10

# Install Cmake
cd /tmp
wget https://github.com/Kitware/CMake/releases/download/v3.14.4/cmake-3.14.4-Linux-x86_64.sh
chmod +x cmake-3.14.4-Linux-x86_64.sh
./cmake-3.14.4-Linux-x86_64.sh --prefix=/usr/local --exclude-subdir --skip-license
rm ./cmake-3.14.4-Linux-x86_64.sh

# Install dependencies
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist nvidia-eff==0.5.3
pip install nvidia-tao==4.0.0
pip install --ignore-installed PyYAML -r /content/drive/MyDrive/nvidia-tao/tensorflow/requirements-pip.txt -f https://download.pytorch.org/whl/torch_stable.html --extra-index-url https://developer.download.nvidia.com/compute/redist

# Install code related wheels
pip install nvidia-tao-tf1==4.0.0.657.dev0

Aside from that, I also added a snippet containing the following content as I still couldn’t run tao correctly without them, though I guess I could have also added them to the said .sh file.

!pip install opencv-python==4.4.0.44
!pip install numba
!pip install clearml

Morganh · July 31, 2023, 9:54am

Thanks @silentjcr !
@castej10 Could you please check if above way works on your side? Thanks.

silentjcr · July 31, 2023, 10:18am

Well, now it seems to be PyTorch’s turn…
I tried to train my model using ActionRecognitonNet sample code and tao train didn’t work succesfully.

Train RGB only model with PTM
[NeMo W 2023-07-31 10:07:18 optimizers:55] Apex was not found. Using the lamb or fused_adam optimizer will error out.
[NeMo W 2023-07-31 10:07:23 optimizers:55] Apex was not found. Using the lamb or fused_adam optimizer will error out.
[NeMo W 2023-07-31 10:07:23 nemo_logging:349] /usr/local/lib/python3.7/dist-packages/nvidia_tao_pytorch/cv/action_recognition/scripts/train.py:81: UserWarning: 
    'train_rgb_3d_finetune.yaml' is validated against ConfigStore schema with the same name.
    This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
    See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
    
Created a temporary directory at /tmp/tmp57si20o_
Writing /tmp/tmp57si20o_/_remote_module_non_scriptable.py
loading trained weights from /content/results/pretrained/actionrecognitionnet_vtrainable_v1.0/resnet18_3d_rgb_hmdb5_32.tlt
Error executing job with overrides: ['output_dir=/content/results/rgb_3d_ptm', 'encryption_key=nvidia_tao', 'model_config.rgb_pretrained_model_path=/content/results/pretrained/actionrecognitionnet_vtrainable_v1.0/resnet18_3d_rgb_hmdb5_32.tlt', 'model_config.rgb_pretrained_num_classes=2']
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/hydra/_internal/utils.py", line 211, in run_and_report
    return func()
  File "/usr/local/lib/python3.7/dist-packages/hydra/_internal/utils.py", line 371, in <lambda>
    overrides=args.overrides,
  File "/usr/local/lib/python3.7/dist-packages/hydra/_internal/hydra.py", line 110, in run
    _ = ret.return_value
  File "/usr/local/lib/python3.7/dist-packages/hydra/core/utils.py", line 233, in return_value
    raise self._return_value
  File "/usr/local/lib/python3.7/dist-packages/hydra/core/utils.py", line 160, in run_job
    ret.return_value = task_function(task_cfg)
  File "<frozen cv.action_recognition.scripts.train>", line 77, in main
  File "<frozen cv.action_recognition.scripts.train>", line 28, in run_experiment
  File "<frozen cv.action_recognition.model.pl_ar_model>", line 33, in __init__
  File "<frozen cv.action_recognition.model.pl_ar_model>", line 39, in _build_model
  File "<frozen cv.action_recognition.model.build_nn_model>", line 82, in build_ar_model
  File "<frozen cv.action_recognition.model.ar_model>", line 105, in get_basemodel3d
  File "<frozen cv.action_recognition.model.resnet3d>", line 366, in resnet3d
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1672, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for ResNet3d:
	size mismatch for fc_cls.weight: copying a param with shape torch.Size([5, 512]) from checkpoint, the shape in current model is torch.Size([2, 512]).
	size mismatch for fc_cls.bias: copying a param with shape torch.Size([5]) from checkpoint, the shape in current model is torch.Size([2]).

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "</usr/local/lib/python3.7/dist-packages/nvidia_tao_pytorch/cv/action_recognition/scripts/train.py>", line 3, in <module>
  File "<frozen cv.action_recognition.scripts.train>", line 81, in <module>
  File "<frozen cv.super_resolution.scripts.configs.hydra_runner>", line 103, in wrapper
  File "/usr/local/lib/python3.7/dist-packages/hydra/_internal/utils.py", line 368, in _run_hydra
    lambda: hydra.run(
  File "/usr/local/lib/python3.7/dist-packages/hydra/_internal/utils.py", line 251, in run_and_report
    assert mdl is not None
AssertionError
Telemetry data couldn't be sent, but the command ran successfully.
[Error]: 'str' object has no attribute 'decode'
Execution status: FAIL

garazraz · July 31, 2023, 11:09am

Is it normal that the colab notebook installs the tao python 4.0.0 package and not 5.0.0? Or is that what is still being worked on? So its not yet possible to run tao 5.0.0 on colab?

Topic		Replies	Views
Running tao toolkit in google colab TAO Toolkit tao	14	2005	July 9, 2023
Google Colab ssd.ipynb no python3.6 TAO Toolkit	35	3168	February 22, 2024
TAO Toolkit Google colab Error TAO Toolkit	7	836	February 22, 2024
Running TAO Toolkit on Google Colab TAO Toolkit	8	682	February 22, 2024
TAO on Google Colab TAO Toolkit	7	948	February 22, 2024
Tao in Google Colab error: /bin/bash: line 1: tao: command not found TAO Toolkit	5	1865	February 22, 2024
TAO classification /bin/sh: 1: pip3: not found TAO Toolkit	80	2391	July 26, 2022
Model training using Tao toolkit with colab not working anymore TAO Toolkit	4	24	September 9, 2025
TAO toolkit happend some .so bug TAO Toolkit tao	19	950	September 9, 2022
TAO 5.0 google colab TAO Toolkit	4	685	August 28, 2023

Can't run the provided TAO toolkit sample code

Related topics