OSError: Specfile not found plz help

cgucsie666 · September 1, 2021, 5:37am

Please provide the following information when requesting support.

• Hardware TitanRTX Ubuntu
• Network Type:UNet
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
• Training spec file(If have, please share here)
Configuration of the TAO Toolkit Instance

dockers:
nvidia/tao/tao-toolkit-tf:
docker_registry: nvcr.io
docker_tag: v3.21.08-py3
tasks:
1. augment
2. bpnet
3. classification
4. detectnet_v2
5. dssd
6. emotionnet
7. faster_rcnn
8. fpenet
9. gazenet
10. gesturenet
11. heartratenet
12. lprnet
13. mask_rcnn
14. multitask_classification
15. retinanet
16. ssd
17. unet
18. yolo_v3
19. yolo_v4
20. converter
nvidia/tao/tao-toolkit-pyt:
docker_registry: nvcr.io
docker_tag: v3.21.08-py3
tasks:
1. speech_to_text
2. speech_to_text_citrinet
3. text_classification
4. question_answering
5. token_classification
6. intent_slot_classification
7. punctuation_and_capitalization
nvidia/tao/tao-toolkit-lm:
docker_registry: nvcr.io
docker_tag: v3.21.08-py3
tasks:
1. n_gram
format_version: 1.0
toolkit_version: 3.21.08
published_date: 08/17/2021
• How to reproduce the issue ?

while running :

print(“For multi-GPU, change --gpus based on your machine.”)
!tao unet train --gpus=1 --gpu_index=$GPU_INDEX
-e $SPECS_DIR/unet_train_resnet_unet_isbi.txt
-r $USER_EXPERIMENT_DIR/isbi_experiment_unpruned
-m $USER_EXPERIMENT_DIR/pretrained_resnet18/pretrained_semantic_segmentation_vresnet18/resnet_18.hdf5
-n model_isbi
-k $KEY

the log:
For multi-GPU, change --gpus based on your machine.
2021-09-01 05:28:50,313 [INFO] root: Registry: [‘nvcr.io’]
2021-09-01 05:28:50,461 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/root/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/hooks/checkpoint_saver_hook.py:21: The name tf.train.CheckpointSaverHook is deprecated. Please use tf.estimator.CheckpointSaverHook instead.

WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/hooks/pretrained_restore_hook.py:23: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.

WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/hooks/pretrained_restore_hook.py:23: The name tf.logging.WARN is deprecated. Please use tf.compat.v1.logging.WARN instead.

WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/scripts/train.py:405: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/init.py:117: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/init.py:143: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

Loading experiment spec at /workspace/tlt-experiments/models/unet/specs/unet_train_resnet_unet_isbi.txt.
2021-09-01 05:28:57,190 [INFO] main: Loading experiment spec at /workspace/tlt-experiments/models/unet/specs/unet_train_resnet_unet_isbi.txt.
2021-09-01 05:28:57,191 [INFO] iva.unet.spec_handler.spec_loader: Merging specification from /workspace/tlt-experiments/models/unet/specs/unet_train_resnet_unet_isbi.txt
Traceback (most recent call last):
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/scripts/train.py”, line 419, in
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/scripts/train.py”, line 413, in main
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/scripts/train.py”, line 283, in run_experiment
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/spec_handler/spec_loader.py”, line 68, in load_experiment_spec
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/spec_handler/spec_loader.py”, line 48, in load_proto
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/spec_handler/spec_loader.py”, line 32, in _load_from_file
OSError: Specfile not found at: /workspace/tlt-experiments/models/unet/specs/unet_train_resnet_unet_isbi.txt
2021-09-01 05:28:58,476 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Morganh · September 1, 2021, 6:03am

Please check your ~/.tao_mounts.json.
It will map your local directory to the docker directory.
See more info in TAO Toolkit Launcher — TAO Toolkit 3.22.05 documentation

cgucsie666 · September 1, 2021, 6:19am

drive_map = {
“Mounts”: [
# Mapping the data directory
{
“source”: os.environ[“LOCAL_PROJECT_DIR”],
“destination”: “/workspace/tao-experiments”
},
# Mapping the specs directory.
{
“source”: os.environ[“LOCAL_SPECS_DIR”],
“destination”: os.environ[“SPECS_DIR”]
},
],
“DockerOptions”: {
“user”: “{}:{}”.format(os.getuid(), os.getgid())
}
}

Mapping up the local directories to the TAO docker.

import json
mounts_file = os.path.expanduser(“~/.tao_mounts.json”)

Define the dictionary with the mapped drives

drive_map = {
“Mounts”: [
# Mapping the data directory
{
“source”: os.environ[“LOCAL_PROJECT_DIR”],
“destination”: “/workspace/tlt-experiments”
},
# Mapping the specs directory.
{
“source”: os.environ[“LOCAL_SPECS_DIR”],
“destination”: os.environ[“SPECS_DIR”]
},
]
}

Writing the mounts file.

with open(mounts_file, “w”) as mfile:
json.dump(drive_map, mfile, indent=4)

this is my mount code, followed the instruction, changed the path to my file but seems like isn’t working?

cgucsie666 · September 1, 2021, 6:53am

what i’d done:
1.Adding -v /var/run/docker.sock:/var/run/docker.sock to when executing docker run
2. tried Run TLT inside docker - #6 by Morganh this forum’s solution by adding --runtime=nvidia doesn’t work as well

Morganh · September 1, 2021, 7:10am

Sorry, I cannot understand your mapping.
Can you share your ~/.tao_mounts.json?

$ cat ~/.tao_mounts.json

cgucsie666 · September 1, 2021, 7:11am

{
“Mounts”: [
{
“source”: “/workspace/tlt-experiments/”,
“destination”: “/workspace/tlt-experiments”
},
{
“source”: “/workspace/tlt-experiments/models/unet/specs”,
“destination”: “/workspace/tlt-experiments/models/unet/specs”
}
]
}

Morganh · September 1, 2021, 7:16am

Can you run below command in terminal?
$ tao unet run ls /workspace/tlt-experiments

cgucsie666 · September 1, 2021, 7:17am

thank you for the quick response here’s the result

2021-09-01 07:17:03,947 [INFO] root: Registry: [‘nvcr.io’]
2021-09-01 07:17:04,103 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/root/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
models
2021-09-01 07:17:04,639 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Morganh · September 1, 2021, 7:19am

OK, so the folder “models” is available.
Please use the same way to check if /workspace/tlt-experiments/models/unet/specs/ is available.

Or you can login docker directly to check.
$ tao unet run /bin/bash

cgucsie666 · September 1, 2021, 7:23am

is this what you mean by typing this?

Morganh · September 1, 2021, 7:25am

Correct.

cgucsie666 · September 1, 2021, 7:29am

But still it doesn’t seems like it mapped my .tao_mount.json file to the docker directory like you mentioned before. What should i do?

Morganh · September 1, 2021, 7:30am

As long as you have set the ~/.tao_mounts.json, you need to do nothing. The mapping is already done.
For your case, your local folder /workspace/tlt-experiments/models is already mapped into the docker. In docker, its path is /workspace/tlt-experiments/models according to your ~/.tao_mounts.json.

cgucsie666 · September 1, 2021, 8:07am

thank you for the response i have found that the reason that mapping isn’t working is because i’d misunderstand the document and set the local_project_dir wrong, but now another error has occured like the image below

but when running "docker run " my code is

docker run -v /home/ubuntu/Desktop/dpstream/tlt-experiments/:/workspace/tlt-experiments -p 8888:8888 -v /var/run/docker.sock:/var/run/docker.sock -it --name tlt_train nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.08-py3 /bin/bash

can you see where i did wrong?

Morganh · September 1, 2021, 8:17am

Why did you run above? Indeed, this is one way of running tao docker.

But currently tao provides tao-launcher. So another way to run tao is as below.
$ tao unet …

All the contents inside the latest notebook are using the 2nd way.

So, in short, you need not run “docker run”.

Your “/home/ubuntu/Desktop/dpstream/…” is not set in your ~/.tao_mounts.json. So, the docker cannot find the path.

cgucsie666 · September 1, 2021, 8:58am

Thank you for your help!!
My problem was solved by changing my variables. lol

Topic		Replies	Views
Try to Run tao detectnet_v2 command inside of docker and fork tao toolkit tf TAO Toolkit docker-virtualization-solutions	34	1222	October 14, 2022
Tao toolkit facenet Error TAO Toolkit	14	1394	March 7, 2022
Problem with tlt file mounting TAO Toolkit	29	2565	January 6, 2022
TAO UNET Running Out of Disk Space? TAO Toolkit	18	643	November 7, 2022
Unet_isbi notebook fails at the train instruction TAO Toolkit	2	400	January 25, 2022
TAO error TAO Toolkit	4	482	March 9, 2022
Error in TAO-Toolkit while training TAO Toolkit	15	1619	July 6, 2022
Running tlt- docker.errors.DockerException: Error while fetching server API version TAO Toolkit	16	3837	August 28, 2021
Is there some spacial things about bpnet? A question about "tlt bpnet dataset_convert " for bpnet TAO Toolkit	11	2098	October 9, 2021
Can't see the classification and other folder inside TLT-V3 TAO Toolkit	21	2618	October 12, 2021

OSError: Specfile not found plz help

Mapping up the local directories to the TAO docker.

Define the dictionary with the mapped drives

Writing the mounts file.

Related topics