Unable to export hdf5 to etlt after Tao Training on Colab

Please provide the following information when requesting support.

• Hardware (V100)
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc)
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
• Training spec file(If have, please share here)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

Hello I was able to complete my custom dataset/model training using YoloV4 on google colab. The prediction accuracy for my classes is acceptable. However, during the usage of the YoloV4 notebook I noticed persistent warning of 'Failed to import TensorRT package, exporting TLT to a TensorRT engine will not be available". Lastly when I tried to export the hdf5_epoch_n file it encountered warning and errors.
I added the following command to the YoloV4 notebook to export :

!tao model yolo_v4 export -m /content/drive/MyDrive/results/yolo_v4/bkp_experiment_dir_retrain/weights/yolov4_resnet18_epoch_160.hdf5 -o /content/drive/MyDrive/results/yolo_v4/export_model/yolo_v4.etlt
-e /content/drive/MyDrive/nvidia-tao/tensorflow/yolo_v4/specs/yolo_v4_retrain_resnet18_kitti.txt
-k $KEY

I spent some 2 days trying to understand the issue and trying to convert hdf5 to onnx using other methods. What I noticed is on colab there are 2 python version 3.10(default) and 3.8.18. I think most of the Nvidia related packages are installed under 3.8.18. I suspect this is the reason why tensorrt or tensorboard is not getting detected in spite the actual packages are installed on python 3.8.
Is there some workaround here on getting python 3.8 as default for the notebook? Tried several suggestions from stack overflow but not is not working.
Lastly is there another method of converting hdf5 to onnx or other format such as *.pb or .etlt?

Error and Warning Messages:
Using TensorFlow backend.
2024-03-16 19:25:50.821258: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudart.so.12
2024-03-16 19:25:50,876 [TAO Toolkit] [WARNING] tensorflow 40: Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
2024-03-16 19:25:51,952 [TAO Toolkit] [WARNING] root 329: Limited tf.compat.v2.summary API due to missing TensorBoard installation.
2024-03-16 19:25:52,543 [TAO Toolkit] [WARNING] root 329: Limited tf.compat.v2.summary API due to missing TensorBoard installation.
2024-03-16 19:25:54,520 [TAO Toolkit] [WARNING] nvidia_tao_tf1.cv.common.export.trt_utils 36: Failed to import TensorRT package, exporting TLT to a TensorRT engine will not be available.
2024-03-16 19:25:54,520 [TAO Toolkit] [WARNING] nvidia_tao_tf1.cv.common.export.base_exporter 44: Failed to import TensorRT package, exporting TLT to a TensorRT engine will not be available.
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
WARNING:root:Limited tf.compat.v2.summary API due to missing TensorBoard installation.
WARNING:root:Limited tf.compat.v2.summary API due to missing TensorBoard installation.
WARNING:nvidia_tao_tf1.cv.common.export.trt_utils:Failed to import TensorRT package, exporting TLT to a TensorRT engine will not be available.
WARNING:nvidia_tao_tf1.cv.common.export.base_exporter:Failed to import TensorRT package, exporting TLT to a TensorRT engine will not be available.
Traceback (most recent call last):
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/yolo_v4/scripts/export.py”, line 42, in
raise e
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/yolo_v4/scripts/export.py”, line 26, in
launch_export(Exporter, None, “onnx”)
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/common/export/app.py”, line 323, in launch_export
run_export(Exporter, args, backend)
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/common/export/app.py”, line 277, in run_export
exporter = Exporter(model_path, key,
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/yolo_v4/export/yolov4_exporter.py”, line 81, in init
super(YOLOv4Exporter, self).init(model_path=model_path,
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/common/export/keras_exporter.py”, line 100, in init
super(KerasExporter, self).init(
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/common/export/base_exporter.py”, line 88, in init
self._trt_version_number = NV_TENSORRT_MAJOR * 1000 + NV_TENSORRT_MINOR * 100 +
NameError: name ‘NV_TENSORRT_MAJOR’ is not defined
Telemetry data couldn’t be sent, but the command ran successfully.
[WARNING]: ‘str’ object has no attribute ‘decode’
Execution status: FAIL

In nvidia-tao/tensorflow/setup_env.sh at main · NVIDIA-AI-IOT/nvidia-tao · GitHub, it sets Python 3.8 as the default version.
Could you double check the python version?

Did you have a local dgpu machine? If yes, you can docker run the tao docker and export to onnx file.
$ docker run --runtime=nvidia -it --rm nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf1.15.5 /bin/bash
Then inside the docker,
#yolo_v4 export xxx

When I checked for the python version using import sys
print(sys.version) I’m getting 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0].

Unfortunately, I don’t have Nvidia dgpu, my old gaming desktop has amd. Can I run the tao docker on colab?

Could you check
! /usr/local/bin/python --version
! /usr/bin/python3 --version

I tried this and only installed python3.8 using the command lines on set.env.sh and this is the output:
Python 3.8.18
Python 3.8.18

It seems it’s setting up python3.8 as default but I’m confuse why the sys.version is seeing 3.10. I tried to reset my session to check if the kernel can be reset but colab just hangs-up and crashed. I would think this issue connects to why tensorrt and tensorboard is not able to import when run using the colab cells. I tried to import inside python3.8 command lines and I can see the tensortrt getting imported w/ correct version. So this means that tensorrt was installed but not seen by google colab cells when executed.

Sharing some images. Based on the shared screenshots although python 3.8 seems to be set as default interpreter version, the dist-packages recognize by colab seems to be from the python 3.10. Have other users of colab reported a similar issue? Can you please check if this can be duplicated from your end using the YoloV4 Colab notebook. I’m struggling on this issue and I’m not able to proceed to the inference on my jetson Orin NX due to this issue preventing me from exporting the model to onnx or etlt.

There were some errors that occured during Tao 5.0.0 install wherein some dependencies have mismatch. I included the screenshot for this.

check python version:
image

tensorrt:

tensorflow:

python 3.10 tensorflow:

errors during package installation:

I will try on my side. For the red font log, it can be ignored.

I cannot run exact steps as yours since suddenly the colab told me that it cannot connect to gpu backend due to usage limits.
But I find a way for you to debug or run something directly inside the docker.
You can run
!tao model yolo_v4 run /bin/bash

then, import the tensorrt. You can find as below screentshot, the trt can be imported.

You can try to use this way to run export.

/content# yolo_v4 export xxx

BTW, I download TensorRT-8.6.1.6.Linux.x86_64-gnu.cuda-12.0.tar.gz from the official download center. So, you can see the trt version is 8.6.1.

I tried your recommendation and below are the logs:

I imported tensortrt via python command then exit to go back to /content. When I execute the tao export the error still persist.

bash: cannot set terminal process group (2915): Inappropriate ioctl for device
bash: no job control in this shell
/content# ls
cuda-keyring_1.0-1_all.deb drive sample_data trt_untar
/content# yolo_v4 export
Using TensorFlow backend.
2024-03-20 14:02:58.028917: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudart.so.12
2024-03-20 14:02:58,098 [TAO Toolkit] [WARNING] tensorflow 40: Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
2024-03-20 14:02:59,292 [TAO Toolkit] [WARNING] root 329: Limited tf.compat.v2.summary API due to missing TensorBoard installation.
2024-03-20 14:02:59,980 [TAO Toolkit] [WARNING] root 329: Limited tf.compat.v2.summary API due to missing TensorBoard installation.
2024-03-20 14:03:04,234 [TAO Toolkit] [WARNING] nvidia_tao_tf1.cv.common.export.trt_utils 36: Failed to import TensorRT package, exporting TLT to a TensorRT engine will not be available.
2024-03-20 14:03:04,234 [TAO Toolkit] [WARNING] nvidia_tao_tf1.cv.common.export.base_exporter 44: Failed to import TensorRT package, exporting TLT to a TensorRT engine will not be available.
usage: yolo_v4 export [-h] [–num_processes NUM_PROCESSES] [–gpus GPUS]
[–gpu_index GPU_INDEX [GPU_INDEX …]] [–use_amp] [–log_file LOG_FILE] -m
MODEL [-k KEY] [-o OUTPUT_FILE] [–cal_json_file CAL_JSON_FILE]
[–gen_ds_config] [-e EXPERIMENT_SPEC]
[–static_batch_size STATIC_BATCH_SIZE] [–target_opset TARGET_OPSET]
[–results_dir RESULTS_DIR] [-v]
{train,prune,kmeans,inference,export,evaluate,dataset_convert} …
yolo_v4 export: error: the following arguments are required: -m/–model
/content# python
Python 3.8.18 (default, Aug 25 2023, 13:20:30)
[GCC 11.4.0] on linux
Type “help”, “copyright”, “credits” or “license” for more information.

import tensorrt as trt
trt.version
‘8.6.1’
/content# yolo_v4 export
File “”, line 1
/content# yolo_v4 export
^
SyntaxError: invalid syntax
exit()
/content# yolo_v4 export -m /content/drive/MyDrive/re-trained_model_output/yolov4_resnet18_epoch_100.hdf5 -o /content/drive/MyDrive/results/yolo_v4/export_model/yolo_v4.etlt \ -e /content/drive/MyDrive/nvidia-tao/tensorflow/yolo_v4/specs/yolo_v4_retrain_resnet18_kitti.txt \ -k $KEY
Using TensorFlow backend.
2024-03-20 14:11:19.319542: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudart.so.12
2024-03-20 14:11:19,387 [TAO Toolkit] [WARNING] tensorflow 40: Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
2024-03-20 14:11:20,562 [TAO Toolkit] [WARNING] root 329: Limited tf.compat.v2.summary API due to missing TensorBoard installation.
2024-03-20 14:11:21,193 [TAO Toolkit] [WARNING] root 329: Limited tf.compat.v2.summary API due to missing TensorBoard installation.
2024-03-20 14:11:23,414 [TAO Toolkit] [WARNING] nvidia_tao_tf1.cv.common.export.trt_utils 36: Failed to import TensorRT package, exporting TLT to a TensorRT engine will not be available.
2024-03-20 14:11:23,414 [TAO Toolkit] [WARNING] nvidia_tao_tf1.cv.common.export.base_exporter 44: Failed to import TensorRT package, exporting TLT to a TensorRT engine will not be available.
usage: yolo_v4 export [-h] [–num_processes NUM_PROCESSES] [–gpus GPUS]
[–gpu_index GPU_INDEX [GPU_INDEX …]] [–use_amp] [–log_file LOG_FILE] -m
MODEL [-k KEY] [-o OUTPUT_FILE] [–cal_json_file CAL_JSON_FILE]
[–gen_ds_config] [-e EXPERIMENT_SPEC]
[–static_batch_size STATIC_BATCH_SIZE] [–target_opset TARGET_OPSET]
[–results_dir RESULTS_DIR] [-v]
{train,prune,kmeans,inference,export,evaluate,dataset_convert} …
yolo_v4 export: error: argument /tasks: invalid choice: ’ ’ (choose from ‘train’, ‘prune’, ‘kmeans’, ‘inference’, ‘export’, ‘evaluate’, ‘dataset_convert’)
/content# /results/yolo_v4/export_model/yolo_v4.etlt-o /content/drive/MyDrive/results/yolo_v4/export_model/yolo_v4.etltyDrive/re-trained_model_output/yolov4_resnet18_epoch_100.hdf5
bash: /results/yolo_v4/export_model/yolo_v4.etlt-o: No such file or directory
/content# yolo_v4 export -m /content/drive/MyDrive/re-trained_model_output/yolov4_resnet18_epoch_100.hdf5 -o /content/drive/MyDrive/results/yolo_v4/export_model/yolo_v4.etlt -e /content/drive/MyDrive/nvidia-tao/tensorflow/yolo_v4/specs/yolo_v4_retrain_resnet18_kitti.txt -k $KEY
Using TensorFlow backend.
2024-03-20 14:19:32.594181: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudart.so.12
2024-03-20 14:19:32,701 [TAO Toolkit] [WARNING] tensorflow 40: Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
2024-03-20 14:19:34,324 [TAO Toolkit] [WARNING] root 329: Limited tf.compat.v2.summary API due to missing TensorBoard installation.
2024-03-20 14:19:35,295 [TAO Toolkit] [WARNING] root 329: Limited tf.compat.v2.summary API due to missing TensorBoard installation.
2024-03-20 14:19:37,479 [TAO Toolkit] [WARNING] nvidia_tao_tf1.cv.common.export.trt_utils 36: Failed to import TensorRT package, exporting TLT to a TensorRT engine will not be available.
2024-03-20 14:19:37,479 [TAO Toolkit] [WARNING] nvidia_tao_tf1.cv.common.export.base_exporter 44: Failed to import TensorRT package, exporting TLT to a TensorRT engine will not be available.
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
WARNING:root:Limited tf.compat.v2.summary API due to missing TensorBoard installation.
WARNING:root:Limited tf.compat.v2.summary API due to missing TensorBoard installation.
WARNING:nvidia_tao_tf1.cv.common.export.trt_utils:Failed to import TensorRT package, exporting TLT to a TensorRT engine will not be available.
WARNING:nvidia_tao_tf1.cv.common.export.base_exporter:Failed to import TensorRT package, exporting TLT to a TensorRT engine will not be available.
Traceback (most recent call last):
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/yolo_v4/scripts/export.py”, line 42, in
raise e
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/yolo_v4/scripts/export.py”, line 26, in
launch_export(Exporter, None, “onnx”)
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/common/export/app.py”, line 323, in launch_export
run_export(Exporter, args, backend)
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/common/export/app.py”, line 277, in run_export
exporter = Exporter(model_path, key,
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/yolo_v4/export/yolov4_exporter.py”, line 81, in init
super(YOLOv4Exporter, self).init(model_path=model_path,
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/common/export/keras_exporter.py”, line 100, in init
super(KerasExporter, self).init(
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/common/export/base_exporter.py”, line 88, in init
self._trt_version_number = NV_TENSORRT_MAJOR * 1000 + NV_TENSORRT_MINOR * 100 +
NameError: name ‘NV_TENSORRT_MAJOR’ is not defined
Telemetry data couldn’t be sent, but the command ran successfully.
[WARNING]: ‘str’ object has no attribute ‘decode’
Execution status: FAIL
/content#

The trt.__version__ will just show 8.6.1 instead of 8.6.1.x.
That is the culprit.
See below log.

/content# python
Python 3.8.18 (default, Aug 25 2023, 13:20:30) 
[GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorrt as trt
>>> trt.__version__
'8.6.1'
>>> [NV_TENSORRT_MAJOR, NV_TENSORRT_MINOR, NV_TENSORRT_PATCH, _] = [         int(item) for item         in trt.__version__.split(".")     ]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: not enough values to unpack (expected 4, got 3)

It does not match tao_tensorflow1_backend/nvidia_tao_tf1/cv/common/export/trt_utils.py at main · NVIDIA/tao_tensorflow1_backend · GitHub.

Please use below solution. It works on my side.
Modify the code.

/content# sed -i "s|_]|]|g" /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/common/export/trt_utils.py

Then run exporting.

/content# yolo_v4 export -m /content/drive/MyDrive/results/yolo_v4/experiment_dir_unpruned/weights/yolov4_resnet18_epoch_001.hdf5 -o /content/drive/MyDrive/results/yolo_v4/experiment_dir_unpruned/weights/yolov4_resnet18_epoch_001.onnx -e $SPECS_DIR/yolo_v4_train_resnet18_kitti_seq.txt
Using TensorFlow backend.
2024-03-21 08:32:46.935079: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudart.so.12
2024-03-21 08:32:46,987 [TAO Toolkit] [WARNING] tensorflow 40: Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
2024-03-21 08:32:47,945 [TAO Toolkit] [WARNING] root 329: Limited tf.compat.v2.summary API due to missing TensorBoard installation.
2024-03-21 08:32:48,460 [TAO Toolkit] [WARNING] root 329: Limited tf.compat.v2.summary API due to missing TensorBoard installation.
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
WARNING:root:Limited tf.compat.v2.summary API due to missing TensorBoard installation.
WARNING:root:Limited tf.compat.v2.summary API due to missing TensorBoard installation.

Loaded model
The ONNX operator number change on the optimization: 585 -> 271
INFO:keras2onnx:The ONNX operator number change on the optimization: 585 -> 271
Telemetry data couldn't be sent, but the command ran successfully.
[WARNING]: 'str' object has no attribute 'decode'
Execution status: PASS
/content# 
/content# ls -rltsh /content/drive/MyDrive/results/yolo_v4/experiment_dir_unpruned/weights/yolov4_resnet18_epoch_001.onnx
134M -rw------- 1 root root 134M Mar 21 08:34 /content/drive/MyDrive/results/yolo_v4/experiment_dir_unpruned/weights/yolov4_resnet18_epoch_001.onnx
/content#

Thanks a lot for this work-around, it works and I got the converted onnx model.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.