FileNotFoundError: Model not found

Please provide the following information when requesting support.

• Hardware (RTX A4000)
• Network Type (Yolo_v4)
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
• Training spec file(If have, please share here)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

Hello,
i’m new to TAO and I was trying to follow the yolo_v4 sample notebook. Everything seemed to be running smoothly until the training part. Thats when I got the following output:

To run with multigpu, please change --gpus based on the number of available GPUs in your machine.
2024-07-13 14:58:46,525 [TAO Toolkit] [INFO] root 160: Registry: [‘nvcr.io’]
2024-07-13 14:58:46,587 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 360: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf1.15.5
2024-07-13 14:58:46,604 [TAO Toolkit] [WARNING] nvidia_tao_cli.components.docker_handler.docker_handler 288:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/home/jakub/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
2024-07-13 14:58:46,604 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 301: Printing tty value True
Using TensorFlow backend.
2024-07-13 12:58:49.679827: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudart.so.12
2024-07-13 12:58:49,713 [TAO Toolkit] [WARNING] tensorflow 40: Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
2024-07-13 12:58:50,496 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable TF_ALLOW_IOLIBS=1.
2024-07-13 12:58:50,517 [TAO Toolkit] [WARNING] tensorflow 42: TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable TF_ALLOW_IOLIBS=1.
2024-07-13 12:58:50,520 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable TF_ALLOW_IOLIBS=1.
2024-07-13 12:58:51,462 [TAO Toolkit] [INFO] matplotlib.font_manager 1633: generated new fontManager
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
WARNING:tensorflow:TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable TF_ALLOW_IOLIBS=1.
WARNING: TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable TF_ALLOW_IOLIBS=1.
WARNING: TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable TF_ALLOW_IOLIBS=1.
WARNING: TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/yolo_v4/scripts/train.py:55: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

WARNING: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/yolo_v4/scripts/train.py:55: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/yolo_v4/scripts/train.py:58: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

WARNING: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/yolo_v4/scripts/train.py:58: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:153: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

WARNING: From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:153: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

INFO: Log file already exists at /workspace/tao-experiments/yolo_v4/experiment_dir_unpruned/status.json
INFO: Starting Yolo_V4 Training job
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING: From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

WARNING: From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead.

WARNING: From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/third_party/keras/tensorflow_backend.py:195: The name tf.nn.max_pool is deprecated. Please use tf.nn.max_pool2d instead.

WARNING: From /usr/local/lib/python3.8/dist-packages/third_party/keras/tensorflow_backend.py:195: The name tf.nn.max_pool is deprecated. Please use tf.nn.max_pool2d instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:2018: The name tf.image.resize_nearest_neighbor is deprecated. Please use tf.compat.v1.image.resize_nearest_neighbor instead.

WARNING: From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:2018: The name tf.image.resize_nearest_neighbor is deprecated. Please use tf.compat.v1.image.resize_nearest_neighbor instead.

INFO: Serial augmentation enabled = False
INFO: Pseudo sharding enabled = False
INFO: Max Image Dimensions (all sources): (0, 0)
INFO: number of cpus: 16, io threads: 32, compute threads: 16, buffered batches: -1
INFO: total dataset size 6733, number of sources: 1, batch size per gpu: 20, steps: 337
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.

WARNING: From /usr/local/lib/python3.8/dist-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.

INFO: Bounding box coordinates were detected in the input specification! Bboxes will be automatically converted to polygon coordinates.
INFO: shuffle: True - shard 0 of 1
INFO: sampling 1 datasets with weights:
INFO: source: 0 weight: 1.000000
WARNING:tensorflow:The operation tf.image.convert_image_dtype will be skipped since the input and output dtypes are identical.
WARNING: The operation tf.image.convert_image_dtype will be skipped since the input and output dtypes are identical.
WARNING:tensorflow:The operation tf.image.convert_image_dtype will be skipped since the input and output dtypes are identical.
WARNING: The operation tf.image.convert_image_dtype will be skipped since the input and output dtypes are identical.
WARNING:tensorflow:The operation tf.image.convert_image_dtype will be skipped since the input and output dtypes are identical.
WARNING: The operation tf.image.convert_image_dtype will be skipped since the input and output dtypes are identical.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/yolo_v4/dataio/tf_data_pipe.py:131: The name tf.image.resize_images is deprecated. Please use tf.image.resize instead.

WARNING: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/yolo_v4/dataio/tf_data_pipe.py:131: The name tf.image.resize_images is deprecated. Please use tf.image.resize instead.

INFO: Model not found: /home/jakub/tao_eksperymenty/yolo_v4/pretrained_resnet18/pretrained_object_detection_vresnet18/resnet_18.hdf5
Traceback (most recent call last):
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/yolo_v4/scripts/train.py”, line 165, in
main()
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/common/utils.py”, line 717, in return_func
raise e
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/common/utils.py”, line 705, in return_func
return func(*args, **kwargs)
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/yolo_v4/scripts/train.py”, line 161, in main
raise e
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/yolo_v4/scripts/train.py”, line 143, in main
run_experiment(
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/yolo_v4/scripts/train.py”, line 84, in run_experiment
model = build_training_pipeline(
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/yolo_v4/models/utils.py”, line 74, in build_training_pipeline
yolov4.build_training_model(hvd)
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/yolo_v4/models/yolov4_model.py”, line 480, in build_training_model
self.load_pretrained_model(
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/yolo_v4/models/yolov4_model.py”, line 308, in load_pretrained_model
pretrained_model = model_io.load_model(
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/yolo_v4/utils/model_io.py”, line 66, in load_model
model = load_keras_model(model_path,
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/common/utils.py”, line 576, in load_keras_model
raise FileNotFoundError(f"Model not found: {filepath}")
FileNotFoundError: Model not found: /home/jakub/tao_eksperymenty/yolo_v4/pretrained_resnet18/pretrained_object_detection_vresnet18/resnet_18.hdf5
Execution status: FAIL
2024-07-13 14:59:07,362 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 363: Stopping container.

I can see the model file locally in that directory. This is the content of my ~/.tao_mounts.json file: {
“Mounts”: [
{
“source”: “/home/jakub/tao_eksperymenty”,
“destination”: “/workspace/tao-experiments”
},
{
“source”: “/home/jakub/getting_started_v5.3.0/notebooks/tao_launcher_starter_kit/yolo_v4/specs”,
“destination”: “/workspace/tao-experiments/yolo_v4/specs”
}
]
}

I suspect that this is an issue with docker mapping but as a beginner I cant seem to find a solution.
Any help would be greatly appreciated.

According to your ~/.tao_mounts.json , the file path inside the docker will be /workspace/tao-experiments.
So, the hdf5 file locates at /workspace/tao-experiments/yolo_v4/pretrained_resnet18/pretrained_object_detection_vresnet18/resnet_18.hdf5

Does that mean that I should change the destination in the first mount? That particular part was done automatically by this part of the sample notebook script:

Mapping up the local directories to the TAO docker.

import json
mounts_file = os.path.expanduser(“~/.tao_mounts.json”)

Define the dictionary with the mapped drives

drive_map = {
“Mounts”: [
# Mapping the data directory
{
“source”: os.environ[“LOCAL_PROJECT_DIR”],
“destination”: “/workspace/tao-experiments”
},
# Mapping the specs directory.
{
“source”: os.environ[“LOCAL_SPECS_DIR”],
“destination”: os.environ[“SPECS_DIR”]
},
]
}

Writing the mounts file.

with open(mounts_file, “w”) as mfile:
json.dump(drive_map, mfile, indent=4)

I guess my question is why doesn’t this notebook run seemlessly when the only thing I changed what the %env LOCAL_PROJECT_DIR=YOUR_LOCAL_PROJECT_DIR_PATH variable that was supposed to be changed?

Above is your local path.

This is a path defined for docker inside.

So, it is needed to change the pretrained model in your training spec file.
See tao_tutorials/notebooks/tao_launcher_starter_kit/yolo_v4/specs/yolo_v4_train_resnet18_kitti.txt at main · NVIDIA/tao_tutorials · GitHub. This path should be a path inside the docker.

Thanks, the training process has finally started

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.