FileNotFoundError: Model not found

jakub.nowakowski01 · July 13, 2024, 1:55pm

Please provide the following information when requesting support.

• Hardware (RTX A4000)
• Network Type (Yolo_v4)
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
• Training spec file(If have, please share here)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

Hello,
i’m new to TAO and I was trying to follow the yolo_v4 sample notebook. Everything seemed to be running smoothly until the training part. Thats when I got the following output:

To run with multigpu, please change --gpus based on the number of available GPUs in your machine.
2024-07-13 14:58:46,525 [TAO Toolkit] [INFO] root 160: Registry: [‘nvcr.io’]
2024-07-13 14:58:46,587 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 360: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf1.15.5
2024-07-13 14:58:46,604 [TAO Toolkit] [WARNING] nvidia_tao_cli.components.docker_handler.docker_handler 288:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/home/jakub/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
2024-07-13 14:58:46,604 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 301: Printing tty value True
Using TensorFlow backend.
2024-07-13 12:58:49.679827: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudart.so.12
2024-07-13 12:58:49,713 [TAO Toolkit] [WARNING] tensorflow 40: Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
2024-07-13 12:58:50,496 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable TF_ALLOW_IOLIBS=1.
2024-07-13 12:58:50,517 [TAO Toolkit] [WARNING] tensorflow 42: TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable TF_ALLOW_IOLIBS=1.
2024-07-13 12:58:50,520 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable TF_ALLOW_IOLIBS=1.
2024-07-13 12:58:51,462 [TAO Toolkit] [INFO] matplotlib.font_manager 1633: generated new fontManager
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
WARNING:tensorflow:TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable TF_ALLOW_IOLIBS=1.
WARNING: TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable TF_ALLOW_IOLIBS=1.
WARNING: TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable TF_ALLOW_IOLIBS=1.
WARNING: TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/yolo_v4/scripts/train.py:55: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

WARNING: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/yolo_v4/scripts/train.py:55: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/yolo_v4/scripts/train.py:58: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

WARNING: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/yolo_v4/scripts/train.py:58: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:153: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

WARNING: From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:153: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

INFO: Log file already exists at /workspace/tao-experiments/yolo_v4/experiment_dir_unpruned/status.json
INFO: Starting Yolo_V4 Training job
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING: From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

WARNING: From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead.

WARNING: From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/third_party/keras/tensorflow_backend.py:195: The name tf.nn.max_pool is deprecated. Please use tf.nn.max_pool2d instead.

WARNING: From /usr/local/lib/python3.8/dist-packages/third_party/keras/tensorflow_backend.py:195: The name tf.nn.max_pool is deprecated. Please use tf.nn.max_pool2d instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:2018: The name tf.image.resize_nearest_neighbor is deprecated. Please use tf.compat.v1.image.resize_nearest_neighbor instead.

WARNING: From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:2018: The name tf.image.resize_nearest_neighbor is deprecated. Please use tf.compat.v1.image.resize_nearest_neighbor instead.

INFO: Serial augmentation enabled = False
INFO: Pseudo sharding enabled = False
INFO: Max Image Dimensions (all sources): (0, 0)
INFO: number of cpus: 16, io threads: 32, compute threads: 16, buffered batches: -1
INFO: total dataset size 6733, number of sources: 1, batch size per gpu: 20, steps: 337
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.

WARNING: From /usr/local/lib/python3.8/dist-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.

INFO: Bounding box coordinates were detected in the input specification! Bboxes will be automatically converted to polygon coordinates.
INFO: shuffle: True - shard 0 of 1
INFO: sampling 1 datasets with weights:
INFO: source: 0 weight: 1.000000
WARNING:tensorflow:The operation tf.image.convert_image_dtype will be skipped since the input and output dtypes are identical.
WARNING: The operation tf.image.convert_image_dtype will be skipped since the input and output dtypes are identical.
WARNING:tensorflow:The operation tf.image.convert_image_dtype will be skipped since the input and output dtypes are identical.
WARNING: The operation tf.image.convert_image_dtype will be skipped since the input and output dtypes are identical.
WARNING:tensorflow:The operation tf.image.convert_image_dtype will be skipped since the input and output dtypes are identical.
WARNING: The operation tf.image.convert_image_dtype will be skipped since the input and output dtypes are identical.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/yolo_v4/dataio/tf_data_pipe.py:131: The name tf.image.resize_images is deprecated. Please use tf.image.resize instead.

WARNING: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/yolo_v4/dataio/tf_data_pipe.py:131: The name tf.image.resize_images is deprecated. Please use tf.image.resize instead.

INFO: Model not found: /home/jakub/tao_eksperymenty/yolo_v4/pretrained_resnet18/pretrained_object_detection_vresnet18/resnet_18.hdf5
Traceback (most recent call last):
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/yolo_v4/scripts/train.py”, line 165, in
main()
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/common/utils.py”, line 717, in return_func
raise e
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/common/utils.py”, line 705, in return_func
return func(*args, **kwargs)
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/yolo_v4/scripts/train.py”, line 161, in main
raise e
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/yolo_v4/scripts/train.py”, line 143, in main
run_experiment(
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/yolo_v4/scripts/train.py”, line 84, in run_experiment
model = build_training_pipeline(
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/yolo_v4/models/utils.py”, line 74, in build_training_pipeline
yolov4.build_training_model(hvd)
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/yolo_v4/models/yolov4_model.py”, line 480, in build_training_model
self.load_pretrained_model(
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/yolo_v4/models/yolov4_model.py”, line 308, in load_pretrained_model
pretrained_model = model_io.load_model(
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/yolo_v4/utils/model_io.py”, line 66, in load_model
model = load_keras_model(model_path,
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/common/utils.py”, line 576, in load_keras_model
raise FileNotFoundError(f"Model not found: {filepath}")
FileNotFoundError: Model not found: /home/jakub/tao_eksperymenty/yolo_v4/pretrained_resnet18/pretrained_object_detection_vresnet18/resnet_18.hdf5
Execution status: FAIL
2024-07-13 14:59:07,362 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 363: Stopping container.

I can see the model file locally in that directory. This is the content of my ~/.tao_mounts.json file: {
“Mounts”: [
{
“source”: “/home/jakub/tao_eksperymenty”,
“destination”: “/workspace/tao-experiments”
},
{
“source”: “/home/jakub/getting_started_v5.3.0/notebooks/tao_launcher_starter_kit/yolo_v4/specs”,
“destination”: “/workspace/tao-experiments/yolo_v4/specs”
}
]
}

I suspect that this is an issue with docker mapping but as a beginner I cant seem to find a solution.
Any help would be greatly appreciated.

Morganh · July 13, 2024, 2:30pm

According to your ~/.tao_mounts.json , the file path inside the docker will be /workspace/tao-experiments.
So, the hdf5 file locates at /workspace/tao-experiments/yolo_v4/pretrained_resnet18/pretrained_object_detection_vresnet18/resnet_18.hdf5

jakub.nowakowski01 · July 13, 2024, 3:27pm

Does that mean that I should change the destination in the first mount? That particular part was done automatically by this part of the sample notebook script:

Mapping up the local directories to the TAO docker.

import json
mounts_file = os.path.expanduser(“~/.tao_mounts.json”)

Define the dictionary with the mapped drives

drive_map = {
“Mounts”: [
# Mapping the data directory
{
“source”: os.environ[“LOCAL_PROJECT_DIR”],
“destination”: “/workspace/tao-experiments”
},
# Mapping the specs directory.
{
“source”: os.environ[“LOCAL_SPECS_DIR”],
“destination”: os.environ[“SPECS_DIR”]
},
]
}

Writing the mounts file.

with open(mounts_file, “w”) as mfile:
json.dump(drive_map, mfile, indent=4)

I guess my question is why doesn’t this notebook run seemlessly when the only thing I changed what the %env LOCAL_PROJECT_DIR=YOUR_LOCAL_PROJECT_DIR_PATH variable that was supposed to be changed?

Morganh · July 13, 2024, 3:31pm

Above is your local path.

This is a path defined for docker inside.

So, it is needed to change the pretrained model in your training spec file.
See tao_tutorials/notebooks/tao_launcher_starter_kit/yolo_v4/specs/yolo_v4_train_resnet18_kitti.txt at main · NVIDIA/tao_tutorials · GitHub. This path should be a path inside the docker.

jakub.nowakowski01 · July 13, 2024, 5:08pm

Thanks, the training process has finally started

system · July 27, 2024, 5:09pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
OSError: Specfile not found plz help TAO Toolkit	16	1586	October 12, 2021
Enviromental variables and docker mount error for transfer laerning using yolov4 TAO Toolkit	8	814	October 14, 2021
Tao Training Model Error TAO Toolkit	7	495	January 15, 2024
Problem with tlt file mounting TAO Toolkit	29	2342	January 6, 2022
Error in Generating TFrecords for yolov4 TAO Toolkit	38	1227	May 17, 2022
Error in TAO-Toolkit while training TAO Toolkit	15	1511	July 6, 2022
Does yolov4 tiny support evaluation of pruned models? TAO Toolkit	2	310	August 11, 2023
Spec file for yolo v3 not recognized TAO Toolkit	11	24	September 30, 2024
TAO - PIL.Image.DecompressionBombError TAO Toolkit	16	968	December 22, 2023
About tao_mounts.json and docker container stop in traning cell TAO Toolkit	7	897	July 6, 2022

FileNotFoundError: Model not found

Mapping up the local directories to the TAO docker.

Define the dictionary with the mapped drives

Writing the mounts file.

Related topics