Can't see the classification and other folder inside TLT-V3

Hi,

I am using TLT-V3 on GTX 1650.
I have taken the pull of image using docker pull nvcr.io/nvidia/tlt-streamanalytics:v3.0-py3 command.

and run docker images using below command
sudo docker run --runtime=nvidia -it -v /AgeClassification/tlt-experiments:/workspace/tlt-experiments -p 8888:8888 nvcr.io/nvidia/tlt-streamanalytics:v3.0_py3 /bin/bash

and after running Jupyter-Notebook using below command
jupyter notebook --ip 0.0.0.0 --port 8888 --no-browser --allow-root

I am able to see the following folder


but can not see the classification and other folder as we were able to see in TLT-V2.

Can someone please suggest what I am missing ?

Thanks.

Moving this to the TAO Toolkit forum (formally known as TLT) for greater visibility.

TLT 3.0 does not contain samples by default. You should download the jupter notebooks according to TAO Toolkit Quick Start Guide — TAO Toolkit 3.0 documentation

Hi @Morganh
Thanks for the reply.

I am getting following error after running ! tao multitask_classification train cell.

Docker run command :
sudo docker run --runtime=nvidia -it -v /media/AGE-TRAINING-27-AUG/tlt-experiments:/workspace/tlt-experiments -v /var/run/docker.sock:/var/run/docker.sock -p 8888:8888 nvcr.io/nvidia/tlt-streamanalytics:v3.0-py3 /bin/bash

Output of sudo docker login nvcr.io:

Please let me know where I am doing mistake ?

Thanks.

Can you run without sudo? Refer to
https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_quick_start_guide.html#installing-the-pre-requisites

Thanks @Morganh for the response.

I am getting another error while running ! tao multitask_classification train cell.

2021-08-28 11:30:30,980 [INFO] root: Registry: ['nvcr.io']
Using TensorFlow backend.
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/__init__.py:117: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

2021-08-28 11:30:41,053 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/__init__.py:117: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/__init__.py:143: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

2021-08-28 11:30:41,054 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/__init__.py:143: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/multitask_classification/scripts/train.py:172: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

2021-08-28 11:30:41,183 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/multitask_classification/scripts/train.py:172: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/multitask_classification/scripts/train.py:175: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2021-08-28 11:30:41,184 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/multitask_classification/scripts/train.py:175: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

Traceback (most recent call last):
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/multitask_classification/scripts/train.py", line 311, in <module>
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/utils.py", line 494, in return_func
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/utils.py", line 482, in return_func
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/multitask_classification/scripts/train.py", line 307, in main
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/multitask_classification/scripts/train.py", line 179, in run_experiment
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/multitask_classification/utils/spec_loader.py", line 20, in load_experiment_spec
FileNotFoundError: [Errno 2] No such file or directory: '/workspace/example/multitask_classification/specs/mclassification_spec.cfg'
2021-08-28 11:30:43,545 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

This cell (!cat $LOCAL_SPECS_DIR/mclassification_spec.cfg) showing me that my file exists and I manually verified the file it is present on the location but during execution of ! tao multitask_classification train cell I am getting error.

Thanks.

The path should be the path inside the docker.
You can login in docker and check if the file is available inside the docker.
If not, please double check ~/.tao_mounts.json.
See more in TAO Toolkit Launcher — TAO Toolkit 3.0 documentation

Hi @Morganh
Yes files are available inside the docker container and tao_mounts.json is also correct but same issue.

{
    "Mounts": [
        {
            "source": "/workspace/tlt-experiments",
            "destination": "/workspace/tlt-experiments"
        },
        {
            "source": "/workspace/example/multitask_classification/specs",
            "destination": "/workspace/example/multitask_classification/specs"
        }
    ],
    "DockerOptions": {
        "user": "0:0"
    }
}

Can you please suggest where the things are going wrong. ?

I have also given the 777 permission but still same issue.
I am able to access images, train.csv and val.csv but unable to access config.

Can you run following command and share the result?

tao multitask_classification run ls /workspace/example/multitask_classification/specs

Yes @Morganh I have run this command.

!tao multitask_classification run ls /workspace/example/multitask_classification/specs

Output of this command is :

2021-08-30 09:00:37,304 [INFO] root: Registry: ['nvcr.io']
2021-08-30 09:00:38,539 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Seems that the mapping does not take effect.
Can you follow TAO Toolkit Launcher — TAO Toolkit 3.0 documentation and try more in “DockerOptions” ?

Okay will try.

But can we do it without mapping as we do in the TLT-V2 and TLT-v3.0-dp-py3 ?

Sure, just change the docker name as you did in TLT-V2.

Hi @Morganh

Unable to understand your words.

In older version I only took pull and map directory during container run.

docker run --net=host --gpus all -it -v /home/ubuntu/Data_Training/TLT-V3/AGE-GROUP-MULTILABEL/tlt-experiments:/workspace/tlt-experiments -v /var/run/docker.sock:/var/run/docker.sock -p 8888:8888 nvcr.io/nvidia/tlt-streamanalytics:-v3.0-dp-py3 /bin/bash

and then normal notebook run and train model.

but with the same process in tao getting the above issue.
Actually main motive is to use tao to train multilabel classification which is missing in previous version.

Can you try below?
$ docker run --runtime=nvidia -it -v yourlocalfolder:dockerfolder nvcr.io/nvidia/tlt-streamanalytics:v3.0-py3 /bin/bash

Okay.

Trying with the following command and will update you.

docker run --runtime=nvidia -it -v /home/ubuntu/Data_Training/TLT-V3/AGE-GROUP-MULTILABEL-SCRATCH:/workspace/tao-experiments -p 8888:8888  nvcr.io/nvidia/tlt-streamanalytics:v3.0-py3 /bin/bash

Hi @Morganh

What should I pass in FIXME ?

os.environ["LOCAL_PROJECT_DIR"] = FIXME

only ‘/’ or again the complete path

'/home/ubuntu/Data_Training/TLT-V3/AGE-GROUP-MULTILABEL-SCRATCH'

If you do not use mapping and use the command I mention above, please run in the terminal instead of jupyter notebook.

Hi @Morganh

I tried from terminal.

tao multitask_classification train -e /workspace/cv_samples_v1.2.0/multitask_classification/specs/mclassification_spec.cfg -r /workspace/tao-experiments/multitask_classification -k nvidia_tlt --gpus 1

but again got the following error.

~/.tao_mounts.json wasn't found. Falling back to obtain mount points and docker configs from ~/.tlt_mounts.json.
Please note that this will be deprecated going forward.
2021-08-30 11:31:41,422 [INFO] root: Registry: ['nvcr.io']
2021-08-30 11:31:41,531 [INFO] root: No mount points were found in the /root/.tlt_mounts.json file.
2021-08-30 11:31:41,531 [WARNING] tlt.components.docker_handler.docker_handler: 
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/root/.tlt_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
Using TensorFlow backend.
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/__init__.py:117: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

2021-08-30 11:31:49,509 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/__init__.py:117: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/__init__.py:143: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

2021-08-30 11:31:49,509 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/__init__.py:143: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/multitask_classification/scripts/train.py:172: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

2021-08-30 11:31:49,609 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/multitask_classification/scripts/train.py:172: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/multitask_classification/scripts/train.py:175: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2021-08-30 11:31:49,609 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/multitask_classification/scripts/train.py:175: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

Traceback (most recent call last):
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/multitask_classification/scripts/train.py", line 311, in <module>
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/utils.py", line 494, in return_func
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/utils.py", line 482, in return_func
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/multitask_classification/scripts/train.py", line 307, in main
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/multitask_classification/scripts/train.py", line 179, in run_experiment
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/multitask_classification/utils/spec_loader.py", line 20, in load_experiment_spec
FileNotFoundError: [Errno 2] No such file or directory: '/workspace/cv_samples_v1.2.0/multitask_classification/specs/mclassification_spec.cfg'
2021-08-30 11:31:51,403 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Since you are running via tao multitask_classification, make sure ~/.tao_mounts.json are correct.
More, for debugging, you can directly login docker via below way.

$ tao multitask_classification run /bin/bash