Yolo-v4 on colab - ModuleNotFound - No module named 'uff'

Please provide the following information when requesting support.

• Hardware (T4)
• Network Type (resnet18)
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
• Training spec file(If have, please share here)
• How to reproduce the issue ? (!tao model yolo_v4 dataset_convert -d $SPECS_DIR/yolo_v4_tfrecords_kitti_train.txt
-o $DATA_DIR/train/tfrecords/train.)

Hello,

This is my first time of using TaoToolkit. My goal is to do transfer learning using Yolo_V4. So I open the link for theYoloV4 object detection. I was able to run the until step 2.1 -setup python environment. I was step#3 when I execute the following code it resulted to the error -ModuleNotFoundError: No module named ‘uff’ :
Yolo_V4_Colab_Error…txt (2.7 KB)
!tao model yolo_v4 dataset_convert -d $SPECS_DIR/yolo_v4_tfrecords_kitti_train.txt
-o $DATA_DIR/train/tfrecords/train

is there a way to install this manually?

Could you save the colab ipynb file and upload here?

Copy of yolo_v4.zip (8.1 KB)

Hello,

I uploaded as zip file, since cannot upload the *.ipynb file.

I cannot see the logs when you run the cells. Could you please double check?

After the error occurred I needed to close the colab notebook to conserve my session usage, so the logs are already gone. Anyway, running step2.0 took quite a long time since tensor RT needs to be installed plus tensor flow and it’s dependencies. I was able to complete the install but since the log file was quite long it got clipped in the colab so I was not able to view what dependencies/modules were not installed properly. I only got the module error.

Suggest you to run the cells to double check, especially below cell.

import os
if os.environ["GOOGLE_COLAB"] == "1":
    os.environ["bash_script"] = "setup_env.sh"
else:
    os.environ["bash_script"] = "setup_env_desktop.sh"

os.environ["NV_TAO_TF_TOP"] = "/tmp/tao_tensorflow1_backend/"

!sed -i "s|PATH_TO_TRT|$trt_untar_folder_path|g" $COLAB_NOTEBOOKS_PATH/tensorflow/$bash_script
!sed -i "s|TRT_VERSION|$trt_version|g" $COLAB_NOTEBOOKS_PATH/tensorflow/$bash_script
!sed -i "s|PATH_TO_COLAB_NOTEBOOKS|$COLAB_NOTEBOOKS_PATH|g" $COLAB_NOTEBOOKS_PATH/tensorflow/$bash_script

!sh $COLAB_NOTEBOOKS_PATH/tensorflow/$bash_script

Ok, I’m re-running again the colab notebook. But I’m a bit confused why tensorRT needs to be installed? There’s a youtube video by Nvidia regarding running Tao ToolKit on colab, I did not see that tensorRT being installed. Also my purpose was to do transfer learning and not to do any inference. I plan to do the inference on my Jetson Orin NX. Anyway I’ll send you the copy of the notebook with the captured logs.

For running tao in Colab, please follow the cells. For the detailed of setup.sh in tf branch , you can refer to nvidia-tao/tensorflow/setup_env.sh at main · NVIDIA-AI-IOT/nvidia-tao · GitHub.

Copy of yolo_v4 (1).zip (55.2 KB)
I re-run the colab notebook and still encountering the module not found error.

I removed all the files/folders generated using step#2 of the notebook and tried a fresh install. The error - No module named ‘uff’ did not appear. But when executing the code for converting the kitti formated annotation files to TFREcords newset of warnings and error appeared :
Using TensorFlow backend.
2024-03-10 19:20:19,389 [TAO Toolkit] [WARNING] tensorflow 40: Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
2024-03-10 19:20:20,360 [TAO Toolkit] [WARNING] root 329: Limited tf.compat.v2.summary API due to missing TensorBoard installation.
2024-03-10 19:20:20,920 [TAO Toolkit] [WARNING] root 329: Limited tf.compat.v2.summary API due to missing TensorBoard installation.
2024-03-10 19:20:22,837 [TAO Toolkit] [WARNING] nvidia_tao_tf1.cv.common.export.trt_utils 36: Failed to import TensorRT package, exporting TLT to a TensorRT engine will not be available.
2024-03-10 19:20:22,838 [TAO Toolkit] [WARNING] nvidia_tao_tf1.cv.common.export.base_exporter 44: Failed to import TensorRT package, exporting TLT to a TensorRT engine will not be available.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
WARNING:root:Limited tf.compat.v2.summary API due to missing TensorBoard installation.
Using TensorFlow backend.
WARNING:root:Limited tf.compat.v2.summary API due to missing TensorBoard installation.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/dataio/dataset_converter_lib.py:181: The name tf.python_io.TFRecordWriter is deprecated. Please use tf.io.TFRecordWriter instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/dataio/dataset_converter_lib.py:181: The name tf.python_io.TFRecordWriter is deprecated. Please use tf.io.TFRecordWriter instead.

For the dataset_config in the experiment_spec, please use labels in the tfrecords file, while writing the classmap.

Telemetry data couldn’t be sent, but the command ran successfully.
[WARNING]: ‘str’ object has no attribute ‘decode’
Execution status: PASS

The warning message highlights missing TensorBoard install, not sure if that affected the error. Another warning related to no attribute decode. I assume this attribute should be on the generated TFRecords.
The notebook details another method of using the Kitti annotation files instead of the TFRecords:
The default YOLOv4 data format requires generation of TFRecords. Currently, the old sequence data format (image folders and label txt folders) is still supported and if you prefer to use the sequence data format, you can skip this section. To use sequence data format, please use spec file yolo_v4_train_resnet18_kitti_seq.txt and yolo_v4_retrain_resnet18_kitti_seq.txt

The only thing is there is no yolo_v4_train_resnet18_kitti_seq.txt and yolo_v4_retrain_resnet18_kitti_seq.txt on the SPECS DIR. Another more concerning warning is : nvidia_tao_tf1.cv.common.export.base_exporter 44: Failed to import TensorRT package, exporting TLT to a TensorRT engine will not be available.

As I plan to do the inference on my Jetson Orin NX I would need to convert the model TLT file to TensorRT engine file. However with the warning message this would defeat my purpose 100%. Need help to find why it failed to import TensotRT.
Copy of yolo_v4.zip (57.4 KB)

Can you find the output tfrecords under $DATA_DIR/train/tfrecords/train?

Hello,

Yes I can see the output tfrecords on my data folder. But I have some doubts on these data as some have 0kb size and there is no file extension. I’m trying to figure-out how to view these files to confirm if something got corrupted.

I’ll try to re-run again the training but I’m also worried about the TensorRT warning of not being able to convert TLT to engine.

How many images did you run dataset_convert? Can you share the spec file as well?

I have 210 training images and 90 validation images. I think I found the issue why some of the shard files have 0kb. Somy of my images don’t have matching extensions like *.jpg and *.JPG. Because the spec file is looking for *.jpg it ignored most of the images with 8.JPG. I have fixed this issue and my images now have standard extensions. Now I tried to execute the TFRecords conversion. It took some 5 minutes to finish but resulted with the same error as before. I also tried to import tensorrt but got an error → No module names ‘tensorrt’. I’m not sure why it’s not importing as I checked tensorrt is installed at this path - !tar -xzf $trt_tar_path -C /content/trt_untar.

Warning and Errors: Using TensorFlow backend.
2024-03-10 18:09:09.265904: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudart.so.12
2024-03-10 18:09:09,319 [TAO Toolkit] [WARNING] tensorflow 40: Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
2024-03-10 18:09:10,325 [TAO Toolkit] [WARNING] root 329: Limited tf.compat.v2.summary API due to missing TensorBoard installation.
2024-03-10 18:09:10,903 [TAO Toolkit] [WARNING] root 329: Limited tf.compat.v2.summary API due to missing TensorBoard installation.
2024-03-10 18:09:12,898 [TAO Toolkit] [WARNING] nvidia_tao_tf1.cv.common.export.trt_utils 36: Failed to import TensorRT package, exporting TLT to a TensorRT engine will not be available.
2024-03-10 18:09:12,898 [TAO Toolkit] [WARNING] nvidia_tao_tf1.cv.common.export.base_exporter 44: Failed to import TensorRT package, exporting TLT to a TensorRT engine will not be available.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
WARNING:root:Limited tf.compat.v2.summary API due to missing TensorBoard installation.
Using TensorFlow backend.
WARNING:root:Limited tf.compat.v2.summary API due to missing TensorBoard installation.
For the dataset_config in the experiment_spec, please use labels in the tfrecords file, while writing the classmap.

Telemetry data couldn’t be sent, but the command ran successfully.
[WARNING]: ‘str’ object has no attribute ‘decode’
Execution status: PASS

The dataset_convert already succeeds.
Also, the tfrecords files look normal. You can ignore the warning log.

I was able to proceed with the training. Although I did not expect to see very high loss starting from 26,000 down to 482 after 100 epoch. My main issue I’m facing is that only 1 out of 2 classes is getting prediction and I’m not sure why. I followed the recommended dataset formatting/folder layout. My Training Images are slightly mismatch: ‘awake’ - 103 images , ‘drowsy’ - 107. My validation images are 50 each. I combined all the training images into 1 folder named images w/ the following format → awake_N.jpg, drowsy_N.jpg.

I’m checking the settings on yolo_v4_train_resnet18_kitti.txt, but could not find anything wrong that would cause this issue. I have uploaded the training config file.

model_output_labels.txt (12 Bytes)

yolo_v4_train_resnet18_kitti.txt (2.0 KB)

Portion of Training Log(100 epoch, GPU: V100:

User

/content/drive/MyDrive/results/yolo_v4/experiment_dir_unpruned/weights/yolov4_resnet18_epoch_090.hdf5 Epoch 91/100 23/23 [==============================] - 11s 484ms/step - loss: 517.2122 Epoch 92/100 23/23 [==============================] - 10s 448ms/step - loss: 513.0973 Epoch 93/100 23/23 [==============================] - 10s 441ms/step - loss: 509.4698 Epoch 94/100 23/23 [==============================] - 10s 438ms/step - loss: 505.0564 Epoch 95/100 23/23 [==============================] - 10s 434ms/step - loss: 499.1268 Epoch 96/100 23/23 [==============================] - 10s 439ms/step - loss: 495.6057 Epoch 97/100 23/23 [==============================] - 10s 439ms/step - loss: 492.7333 Epoch 98/100 23/23 [==============================] - 10s 441ms/step - loss: 490.6606 Epoch 99/100 23/23 [==============================] - 10s 437ms/step - loss: 488.0959 Epoch 100/100 23/23 [==============================] - 10s 442ms/step - loss: 482.2681 Producing predictions: 100% 4/4 [00:01<00:00, 2.70it/s] Start to calculate AP for each class ******************************* awake AP 0.88322 drowsy AP 0.0 mAP 0.44161 ******************************* Validation loss: 431.5468444824219

This is a new question. Could you generate a new forum topic? Because the original issue is gone now.

For this latest question, did you have the log when you generate the tfrecords files /content/drive/MyDrive/kitti_data/DATA_DIR/train/tfrecords/train*?
In the log, we can see how many “awake” objects and how many “drowsy” objects.
Since you set validation_fold: 0, the evaluation will use the tfrecords files which has -000-of in the filename.

Thanks for your inputs on the validation_fold: 0. I tried to replace this setting with paths for my validation TFRecords and images instead and it worked and I’m getting predictions for both of my 2 classes. Although the loss seems a bit high. Based on the forum discussion on Yolo_V4 the high loss seems to be normal. Not sure what would be the impact on the inference by I’ll try the model.

Epoch 00130: saving model to /content/drive/MyDrive/results/yolo_v4/experiment_dir_unpruned/weights/yolov4_resnet18_epoch_130.hdf5
Epoch 131/140
27/27 [==============================] - 13s 465ms/step - loss: 224.1768
Epoch 132/140
27/27 [==============================] - 12s 429ms/step - loss: 221.7701
Epoch 133/140
27/27 [==============================] - 12s 431ms/step - loss: 221.1449
Epoch 134/140
27/27 [==============================] - 12s 432ms/step - loss: 219.0267
Epoch 135/140
27/27 [==============================] - 12s 431ms/step - loss: 220.0725
Epoch 136/140
27/27 [==============================] - 12s 436ms/step - loss: 216.0832
Epoch 137/140
27/27 [==============================] - 12s 433ms/step - loss: 214.3811
Epoch 138/140
27/27 [==============================] - 12s 433ms/step - loss: 211.9911
Epoch 139/140
27/27 [==============================] - 12s 431ms/step - loss: 210.4830
Epoch 140/140
27/27 [==============================] - 12s 434ms/step - loss: 210.1095
Producing predictions: 100% 13/13 [00:04<00:00, 2.83it/s]
Start to calculate AP for each class


awake AP 0.97174
drowsy AP 0.87812
mAP 0.92493


Validation loss: 172.96787438025842

I think can close this issue, thanks for the inputs it helped a lot!

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.