OSError: Specfile not found plz help

Please provide the following information when requesting support.

• Hardware TitanRTX Ubuntu
• Network Type:UNet
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
• Training spec file(If have, please share here)
Configuration of the TAO Toolkit Instance

dockers:
nvidia/tao/tao-toolkit-tf:
docker_registry: nvcr.io
docker_tag: v3.21.08-py3
tasks:
1. augment
2. bpnet
3. classification
4. detectnet_v2
5. dssd
6. emotionnet
7. faster_rcnn
8. fpenet
9. gazenet
10. gesturenet
11. heartratenet
12. lprnet
13. mask_rcnn
14. multitask_classification
15. retinanet
16. ssd
17. unet
18. yolo_v3
19. yolo_v4
20. converter
nvidia/tao/tao-toolkit-pyt:
docker_registry: nvcr.io
docker_tag: v3.21.08-py3
tasks:
1. speech_to_text
2. speech_to_text_citrinet
3. text_classification
4. question_answering
5. token_classification
6. intent_slot_classification
7. punctuation_and_capitalization
nvidia/tao/tao-toolkit-lm:
docker_registry: nvcr.io
docker_tag: v3.21.08-py3
tasks:
1. n_gram
format_version: 1.0
toolkit_version: 3.21.08
published_date: 08/17/2021
• How to reproduce the issue ?

while running :

print(“For multi-GPU, change --gpus based on your machine.”)
!tao unet train --gpus=1 --gpu_index=$GPU_INDEX
-e $SPECS_DIR/unet_train_resnet_unet_isbi.txt
-r $USER_EXPERIMENT_DIR/isbi_experiment_unpruned
-m $USER_EXPERIMENT_DIR/pretrained_resnet18/pretrained_semantic_segmentation_vresnet18/resnet_18.hdf5
-n model_isbi
-k $KEY

the log:
For multi-GPU, change --gpus based on your machine.
2021-09-01 05:28:50,313 [INFO] root: Registry: [‘nvcr.io’]
2021-09-01 05:28:50,461 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/root/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/hooks/checkpoint_saver_hook.py:21: The name tf.train.CheckpointSaverHook is deprecated. Please use tf.estimator.CheckpointSaverHook instead.

WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/hooks/pretrained_restore_hook.py:23: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.

WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/hooks/pretrained_restore_hook.py:23: The name tf.logging.WARN is deprecated. Please use tf.compat.v1.logging.WARN instead.

WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/scripts/train.py:405: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/init.py:117: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/init.py:143: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

Loading experiment spec at /workspace/tlt-experiments/models/unet/specs/unet_train_resnet_unet_isbi.txt.
2021-09-01 05:28:57,190 [INFO] main: Loading experiment spec at /workspace/tlt-experiments/models/unet/specs/unet_train_resnet_unet_isbi.txt.
2021-09-01 05:28:57,191 [INFO] iva.unet.spec_handler.spec_loader: Merging specification from /workspace/tlt-experiments/models/unet/specs/unet_train_resnet_unet_isbi.txt
Traceback (most recent call last):
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/scripts/train.py”, line 419, in
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/scripts/train.py”, line 413, in main
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/scripts/train.py”, line 283, in run_experiment
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/spec_handler/spec_loader.py”, line 68, in load_experiment_spec
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/spec_handler/spec_loader.py”, line 48, in load_proto
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/spec_handler/spec_loader.py”, line 32, in _load_from_file
OSError: Specfile not found at: /workspace/tlt-experiments/models/unet/specs/unet_train_resnet_unet_isbi.txt
2021-09-01 05:28:58,476 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Please check your ~/.tao_mounts.json.
It will map your local directory to the docker directory.
See more info in TAO Toolkit Launcher — TAO Toolkit 3.22.05 documentation

drive_map = {
“Mounts”: [
# Mapping the data directory
{
“source”: os.environ[“LOCAL_PROJECT_DIR”],
“destination”: “/workspace/tao-experiments”
},
# Mapping the specs directory.
{
“source”: os.environ[“LOCAL_SPECS_DIR”],
“destination”: os.environ[“SPECS_DIR”]
},
],
“DockerOptions”: {
“user”: “{}:{}”.format(os.getuid(), os.getgid())
}
}

Mapping up the local directories to the TAO docker.

import json
mounts_file = os.path.expanduser(“~/.tao_mounts.json”)

Define the dictionary with the mapped drives

drive_map = {
“Mounts”: [
# Mapping the data directory
{
“source”: os.environ[“LOCAL_PROJECT_DIR”],
“destination”: “/workspace/tlt-experiments”
},
# Mapping the specs directory.
{
“source”: os.environ[“LOCAL_SPECS_DIR”],
“destination”: os.environ[“SPECS_DIR”]
},
]
}

Writing the mounts file.

with open(mounts_file, “w”) as mfile:
json.dump(drive_map, mfile, indent=4)

this is my mount code, followed the instruction, changed the path to my file but seems like isn’t working?

what i’d done:
1.Adding -v /var/run/docker.sock:/var/run/docker.sock to when executing docker run
2. tried Run TLT inside docker - #6 by Morganh this forum’s solution by adding --runtime=nvidia doesn’t work as well

Sorry, I cannot understand your mapping.
Can you share your ~/.tao_mounts.json?

$ cat ~/.tao_mounts.json

{
“Mounts”: [
{
“source”: “/workspace/tlt-experiments/”,
“destination”: “/workspace/tlt-experiments”
},
{
“source”: “/workspace/tlt-experiments/models/unet/specs”,
“destination”: “/workspace/tlt-experiments/models/unet/specs”
}
]
}

Can you run below command in terminal?
$ tao unet run ls /workspace/tlt-experiments

thank you for the quick response here’s the result

2021-09-01 07:17:03,947 [INFO] root: Registry: [‘nvcr.io’]
2021-09-01 07:17:04,103 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/root/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
models
2021-09-01 07:17:04,639 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

OK, so the folder “models” is available.
Please use the same way to check if /workspace/tlt-experiments/models/unet/specs/ is available.

Or you can login docker directly to check.
$ tao unet run /bin/bash

is this what you mean by typing this?

Correct.

But still it doesn’t seems like it mapped my .tao_mount.json file to the docker directory like you mentioned before. What should i do?

As long as you have set the ~/.tao_mounts.json, you need to do nothing. The mapping is already done.
For your case, your local folder /workspace/tlt-experiments/models is already mapped into the docker. In docker, its path is /workspace/tlt-experiments/models according to your ~/.tao_mounts.json.

thank you for the response i have found that the reason that mapping isn’t working is because i’d misunderstand the document and set the local_project_dir wrong, but now another error has occured like the image below


but when running "docker run " my code is

docker run -v /home/ubuntu/Desktop/dpstream/tlt-experiments/:/workspace/tlt-experiments -p 8888:8888 -v /var/run/docker.sock:/var/run/docker.sock -it --name tlt_train nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.08-py3 /bin/bash

can you see where i did wrong?

Why did you run above? Indeed, this is one way of running tao docker.

But currently tao provides tao-launcher. So another way to run tao is as below.
$ tao unet …

All the contents inside the latest notebook are using the 2nd way.

So, in short, you need not run “docker run”.

Your “/home/ubuntu/Desktop/dpstream/…” is not set in your ~/.tao_mounts.json. So, the docker cannot find the path.

Thank you for your help!!
My problem was solved by changing my variables. lol