TAO yolo_v3 google colab training failure

Swapnadip_Moni · May 14, 2024, 4:57am

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc)
Tesla T4 in Google Colab
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc)
Yolo V3 ResNet18
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
• Training spec file(If have, please share here)
yolo_v3_train_resnet18_tfrecord.txt (1.9 KB)

• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

when I am executing the TAO train command to train TAO Yolo_v3 ResNet_18 model:

print(“To run with multigpu, please change --gpus based on the number of available GPUs in your machine.”)
!tao model yolo_v3 train -e $SPECS_DIR/yolo_v3_train_resnet18_tfrecord.txt
-r $EXPERIMENT_DIR/experiment_dir_unpruned
-k $KEY
–gpus 1

getting below error :

Epoch 1/10
Traceback (most recent call last):
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/yolo_v3/scripts/train.py”, line 164, in
main()
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/common/utils.py”, line 717, in return_func
raise e
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/common/utils.py”, line 705, in return_func
return func(*args, **kwargs)
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/yolo_v3/scripts/train.py”, line 160, in main
raise e
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/yolo_v3/scripts/train.py”, line 142, in main
run_experiment(
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/yolo_v3/scripts/train.py”, line 94, in run_experiment
model.train(verbose)
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/yolo_v3/models/yolov3_model.py”, line 646, in train
self.keras_model.fit(
File “/usr/local/lib/python3.8/dist-packages/keras/engine/training.py”, line 1027, in fit
return training_arrays.fit_loop(self, f, ins,
File “/usr/local/lib/python3.8/dist-packages/keras/engine/training_arrays.py”, line 154, in fit_loop
outs = f(ins)
File “/usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py”, line 2715, in call
return self._call(inputs)
File “/usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py”, line 2675, in _call
fetched = self._callable_fn(*array_vals)
File “/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py”, line 1470, in call
ret = tf_session.TF_SessionRunCallable(self._session._session,
tensorflow.python.framework.errors_impl.NotFoundError: 2 root error(s) found.
(0) Not found: {{function_node __inference_Dataset_map__map_func_set_random_wrapper_4391}} /content/drive/MyDrive/cable_damage_yolov8_dataset/train/rename_and_save/images//content/drive/MyDrive/cable_damage_yolov8_dataset/train/rename_and_save/images/img_774.jpg; No such file or directory
[[{{node AssetLoader/ReadFile}}]]
[[data_loader_out]]
[[SparseSplit/_7625]]
(1) Not found: {{function_node __inference_Dataset_map__map_func_set_random_wrapper_4391}} /content/drive/MyDrive/cable_damage_yolov8_dataset/train/rename_and_save/images//content/drive/MyDrive/cable_damage_yolov8_dataset/train/rename_and_save/images/img_774.jpg; No such file or directory
[[{{node AssetLoader/ReadFile}}]]
[[data_loader_out]]
0 successful operations.
0 derived errors ignored.
Telemetry data couldn’t be sent, but the command ran successfully.
[WARNING]: ‘str’ object has no attribute ‘decode’
Execution status: FAIL

Morganh · May 14, 2024, 5:04am

The path is not correct. Please double check, especially tao_mounts.json file.

Swapnadip_Moni · May 14, 2024, 5:19am

IN the Google Colab notebook provided by Nvidia, TAO yolo_v3 takes Kitti_data as default.
I have mounted my own dataset for model training. But somehow it is not identifying the dataset properly

Swapnadip_Moni · May 14, 2024, 9:59am

after running the training command
print(“To run with multigpu, please change --gpus based on the number of available GPUs in your machine.”)
!tao model yolo_v3 train -e $SPECS_DIR/yolo_v3_train_resnet18_tfrecord.txt
-r $EXPERIMENT_DIR/experiment_dir_unpruned
-k $KEY
–gpus 1

getting error

/usr/local/lib/python3.8/dist-packages/keras/engine/saving.py:292: UserWarning: No training configuration found in save file: the model was not compiled. Compile it manually.
warnings.warn('No training configuration found in save file: ’

Epoch 1/20
Telemetry data couldn’t be sent, but the command ran successfully.
[WARNING]: ‘str’ object has no attribute ‘decode’
Execution status: FAIL

Morganh · May 14, 2024, 2:35pm

Do you fix the above error? Seems that there are two “/content/drive/MyDrive/cable_damage_yolov8_dataset/train/rename_and_save/images/”.

Swapnadip_Moni · May 14, 2024, 4:44pm

This problem is now solved. I have created a new folder and stored the data accroding to kitti format and updated the file paths

system · May 28, 2024, 4:45pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Nvidia TAO Yolo_v3 training failure TAO Toolkit yolo , tao	8	447	June 11, 2024
Error when training YOLOV3 with TAO TAO Toolkit	5	630	May 20, 2022
The error message says "op_values_and_count_to_sparse_tensor.so cannot be found." TAO Toolkit tao	17	729	February 14, 2024
TAO example for YoloV3 in Google Colab fails to execute the training TAO Toolkit	2	387	March 7, 2023
Running tao toolkit in google colab TAO Toolkit tao	14	2114	July 9, 2023
TAO Toolkit Google Colab YOLO4 Training file not found TAO Toolkit	3	564	January 31, 2023
Model training using Tao toolkit with colab not working anymore TAO Toolkit	4	93	September 9, 2025
Question: tao model yolo_v4_tiny train TAO Toolkit	15	198	October 21, 2024
Tao yolov3 custom dataset training, container issue TAO Toolkit inception	3	668	July 21, 2022
Spec file for yolo v3 not recognized TAO Toolkit	11	181	September 30, 2024

TAO yolo_v3 google colab training failure

Related topics