Unable to export QAT yolov3 in int8

ilias.rmouque1 · April 21, 2023, 12:44pm

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc)
RTX 4090

• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc)
yolo_v3
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
/home/ilias/anaconda3/envs/launcher/lib/python3.6/site-packages/tlt/init.py:20: DeprecationWarning:
The nvidia-tlt package will be deprecated soon. Going forward please migrate to using the nvidia-tao package.

warnings.warn(message, DeprecationWarning)
Configuration of the TAO Toolkit Instance

dockers:
nvidia/tao/tao-toolkit-tf:
v3.21.11-tf1.15.5-py3:
docker_registry: nvcr.io
tasks:
1. augment
2. bpnet
3. classification
4. dssd
5. emotionnet
6. efficientdet
7. fpenet
8. gazenet
9. gesturenet
10. heartratenet
11. lprnet
12. mask_rcnn
13. multitask_classification
14. retinanet
15. ssd
16. unet
17. yolo_v3
18. yolo_v4
19. yolo_v4_tiny
20. converter
v3.21.11-tf1.15.4-py3:
docker_registry: nvcr.io
tasks:
1. detectnet_v2
2. faster_rcnn
nvidia/tao/tao-toolkit-pyt:
v3.21.11-py3:
docker_registry: nvcr.io
tasks:
1. speech_to_text
2. speech_to_text_citrinet
3. text_classification
4. question_answering
5. token_classification
6. intent_slot_classification
7. punctuation_and_capitalization
8. action_recognition
v3.22.02-py3:
docker_registry: nvcr.io
tasks:
1. spectro_gen
2. vocoder
nvidia/tao/tao-toolkit-lm:
v3.21.08-py3:
docker_registry: nvcr.io
tasks:
1. n_gram
format_version: 2.0
toolkit_version: 3.22.02
published_date: 02/28/2022
• Training spec file(If have, please share here)
experiment spec file
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

I run the command :
!tao yolo_v3 export
-e $SPECS_DIR/experiment_spec_exp.json
-m $USER_EXPERIMENT_DIR/experiment_dir_retrain_qat3/weights/yolov3_resnet18_epoch_080.tlt
-o $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector_qat2.etlt
-k $KEY
#–cal_image_dir /workspace/tao-experiments/try-6/train/
#–cal_data_file /$USER_EXPERIMENT_DIR/experiment_dir_final/calibration_qat.tensorfile
–data_type int8
#–batch_size 8
#–max_batch_size 64
–cal_json_file $USER_EXPERIMENT_DIR/experiment_dir_final/calibration_qat.json
#–verbose

with all the “#” line commented or not I get lots of different errors but in this specific configuration I get:

/home/ilias/anaconda3/envs/launcher/lib/python3.6/site-packages/tlt/init.py:20: DeprecationWarning:
The nvidia-tlt package will be deprecated soon. Going forward please migrate to using the nvidia-tao package.

warnings.warn(message, DeprecationWarning)
2023-04-21 14:18:35,036 [INFO] root: Registry: [‘nvcr.io’]
2023-04-21 14:18:35,071 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3
Matplotlib created a temporary config/cache directory at /tmp/matplotlib-ydv8_bf6 because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
Using TensorFlow backend.
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
2023-04-21 12:18:38,083 [INFO] root: Building exporter object.
2023-04-21 12:18:39,692 [INFO] root: Exporting the model.
2023-04-21 12:18:39,692 [INFO] root: Using input nodes: [‘Input’]
2023-04-21 12:18:39,692 [INFO] root: Using output nodes: [‘BatchedNMS’]
2023-04-21 12:18:39,692 [INFO] iva.common.export.keras_exporter: Using input nodes: [‘Input’]
2023-04-21 12:18:39,692 [INFO] iva.common.export.keras_exporter: Using output nodes: [‘BatchedNMS’]
The ONNX operator number change on the optimization: 379 → 173
2023-04-21 12:18:54,369 [INFO] keras2onnx: The ONNX operator number change on the optimization: 379 → 173
[TensorRT] ERROR: 1: [caskUtils.cpp::trtSmToCask::114] Error Code 1: Internal Error (Unsupported SM: 0x809)
2023-04-21 12:18:56,130 [ERROR] modulus.export._tensorrt: Failed to create engine
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/core/build_wheel.runfiles/ai_infra/moduluspy/modulus/export/_tensorrt.py”, line 869, in init
Traceback (most recent call last):
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/core/build_wheel.runfiles/ai_infra/moduluspy/modulus/export/_tensorrt.py”, line 869, in init
AssertionError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v3/scripts/export.py”, line 12, in
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/export/app.py”, line 265, in launch_export
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/export/app.py”, line 247, in run_export
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/export/keras_exporter.py”, line 455, in export
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/core/build_wheel.runfiles/ai_infra/moduluspy/modulus/export/_tensorrt.py”, line 877, in init
AssertionError: Parsing failed on line 869 in statement
2023-04-21 14:18:57,191 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

This generates the etlt file but not the calibration_files.json to make it work with deepstream, what am I doing wrong ? Is Tao compatible with rtx 4090

Thank you for your help !
Best regards,
Ilias.

Morganh · April 21, 2023, 3:36pm

Please update to latest tao version. Refer to Migrating from older TLT to TAO Toolkit - NVIDIA Docs

Or use latest docker directly. TAO Toolkit | NVIDIA NGC . For yolov3, use nvcr.io/nvidia/tao/tao-toolkit:4.0.1-tf1.15.5

ilias.rmouque1 · April 24, 2023, 9:44am

Thank you for your answer !
I dindn’t know I had a version problem, so I changed to run directly the proper docker container.
So I ran :

docker run -it --rm --gpus all -v .:/workspace nvcr.io/nvidia/tao/tao-toolkit:4.0.1-tf1.15.5 yolo_v3 export -e specs/experiment_spec_exp.json -m yolo_v3/experiment_dir_retrain_qat3/weights/yolov3_resnet18_epoch_080.tlt -o yolo_v3/experiment_dir_final/resnet18_detector_qat2.etlt --data_type int8 -k tlt_encode --cal_json_file yolo_v3/experiment_dir_final/calibration_qat.json

and got the logs and errors:

=== TAO Toolkit TensorFlow ===

NVIDIA Release 4.0.1-TensorFlow (build )
TAO Toolkit Version 4.0.1

This container image and its contents are governed by the TAO Toolkit End User License Agreement.
By pulling and using the container, you accept the terms and conditions of this license:

NOTE: The SHMEM allocation limit is set to the default of 64MB. This may be
insufficient for TAO Toolkit. NVIDIA recommends the use of the following flags:
docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 …

Using TensorFlow backend.
2023-04-24 09:23:37.071822: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
/usr/local/lib/python3.6/dist-packages/requests/init.py:91: RequestsDependencyWarning: urllib3 (1.26.5) or chardet (3.0.4) doesn’t match a supported version!
RequestsDependencyWarning)
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
/usr/local/lib/python3.6/dist-packages/requests/init.py:91: RequestsDependencyWarning: urllib3 (1.26.5) or chardet (3.0.4) doesn’t match a supported version!
RequestsDependencyWarning)
2023-04-24 09:23:41,726 [INFO] iva.common.export.keras_exporter: Using input nodes: [‘Input’]
2023-04-24 09:23:41,726 [INFO] iva.common.export.keras_exporter: Using output nodes: [‘BatchedNMS’]
The ONNX operator number change on the optimization: 379 → 173
2023-04-24 09:23:58,712 [INFO] keras2onnx: The ONNX operator number change on the optimization: 379 → 173
2023-04-24 09:23:59,026 [INFO] iva.common.export.base_exporter: Generating a tensorfile with random tensor images. This may work well as a profiling tool, however, it may result in inaccurate results at inference. Please generate a tensorfile using the tlt-int8-tensorfile, or provide a custom directory of images for best performance.
Traceback (most recent call last):
File “</usr/local/lib/python3.6/dist-packages/iva/yolo_v3/scripts/export.py>”, line 3, in
File “”, line 30, in
File “”, line 14, in
File “”, line 302, in launch_export
File “”, line 284, in run_export
File “”, line 410, in export
File “”, line 198, in get_calibrator
File “”, line 309, in generate_tensor_file
File “”, line 352, in generate_random_tensorfile
File “”, line 54, in init
File “/usr/local/lib/python3.6/dist-packages/h5py/_hl/files.py”, line 312, in init
fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)
File “/usr/local/lib/python3.6/dist-packages/h5py/_hl/files.py”, line 148, in make_fid
fid = h5f.create(name, h5f.ACC_TRUNC, fapl=fapl, fcpl=fcpl)
File “h5py/_objects.pyx”, line 54, in h5py._objects.with_phil.wrapper
File “h5py/_objects.pyx”, line 55, in h5py._objects.with_phil.wrapper
File “h5py/h5f.pyx”, line 98, in h5py.h5f.create
ValueError: Invalid file name (invalid file name)
Telemetry data couldn’t be sent, but the command ran successfully.
[WARNING]: <urlopen error [Errno -2] Name or service not known>
Execution status: FAIL

is there an other problem ?

Morganh · April 25, 2023, 7:45am

Can you double check the path of each file? The path should be a path inside the docker.

docker run -it --rm --gpus all -v .:/workspace nvcr.io/nvidia/tao/tao-toolkit:4.0.1-tf1.15.5 /bin/bash

Then,
ls specs/experiment_spec_exp.json
ls yolo_v3/experiment_dir_retrain_qat3/weights/yolov3_resnet18_epoch_080.tlt

…
etc

ilias.rmouque1 · April 25, 2023, 8:32am

Thank you for your answer! So, all the required files are here:
‘’’
root@7ed655ac235e:/workspace# ls specs/
coco_config.json experiment_spec_Q.json experiment_spec_exp.json
experiment_spec.json experiment_spec_QAT.json
root@7ed655ac235e:/workspace# ls specs/experiment_spec_exp.json
specs/experiment_spec_exp.json
root@7ed655ac235e:/workspace# ls yolo_v3/experiment_dir_retrain_qat3/weights/yolov3_resnet18_epoch_080.tlt
yolo_v3/experiment_dir_retrain_qat3/weights/yolov3_resnet18_epoch_080.tlt
‘’’
What I understood about the other files : yolo_v3/experiment_dir_final/resnet18_detector_qat2.etlt is the etlt file that this command generates (wich it does) and yolo_v3/experiment_dir_final/calibration_qat.json should be the generated calibration weights for int8 inference however the command does not create it.

Did I misunderstand something ? Because, for instance, when you run this command with detectnet, it does generate the calibration file.

Morganh · April 25, 2023, 9:03am

Can you refer to the command in latest notebook? GPU-optimized AI, Machine Learning, & HPC Software | NVIDIA NGC

!tao yolo_v3 export -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/yolov3_resnet18_epoch_$EPOCH.tlt \
                    -k $KEY \
                    -o $USER_EXPERIMENT_DIR/export/yolov3_resnet18_epoch_$EPOCH.etlt \
                    -e $SPECS_DIR/yolo_v3_retrain_resnet18_tfrecord.txt \
                    --target_opset 12 \
                    --gen_ds_config

ilias.rmouque1 · April 25, 2023, 2:38pm

Okay thank you for this ressource it really helped I understood my problem ! I was following a former notebook where the cal.bin file could be generated during the export (with detectnet at least). But now to generate the cal bin file you need to use tao-deploy.

So the process will be first to train with tao,
then export using your command :

!tao yolo_v3 export -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/yolov3_resnet18_epoch_$EPOCH.tlt \
                    -k $KEY \
                    -o $USER_EXPERIMENT_DIR/export/yolov3_resnet18_epoch_$EPOCH.etlt \
                    -e $SPECS_DIR/yolo_v3_retrain_resnet18_tfrecord.txt \
                    --target_opset 12 \
                    --gen_ds_config

then deploy to generate the engine file and the cal.bin file:

!tao-deploy yolo_v3 gen_trt_engine -m $USER_EXPERIMENT_DIR/export/yolov3_resnet18_epoch_$EPOCH.etlt \
                                   -k $KEY \
                                   -e $SPECS_DIR/yolo_v3_retrain_resnet18_tfrecord.txt \
                                   --cal_image_dir $DATA_DOWNLOAD_DIR/testing/image_2 \
                                   --data_type int8 \
                                   --batch_size 16 \
                                   --min_batch_size 1 \
                                   --opt_batch_size 8 \
                                   --max_batch_size 16 \
                                   --batches 10 \
                                   --cal_cache_file $USER_EXPERIMENT_DIR/export/cal.bin  \
                                   --cal_data_file $USER_EXPERIMENT_DIR/export/cal.tensorfile \
                                   --engine_file $USER_EXPERIMENT_DIR/export/trt.engine.int8

however it doesn’t work for me if in the spec file (-e) there’s enable_qat=true but frankly now I’m just happy it works haha

Thank you very much for your help !

system · May 9, 2023, 2:38pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Unable to deploy TAO 4.0.1 yolov4 model on deepstream6.0 TAO Toolkit deepstream	43	1077	August 18, 2023
Yolov3 worklfow or incorrect calibration file for int8 inference TAO Toolkit tensorrt , yolo , deepstream	6	523	July 6, 2023
Unable to export hdf5 to etlt after Tao Training on Colab TAO Toolkit yolo	11	664	March 21, 2024
Convert TAO Yolov4 model to DLA engine fails TAO Toolkit	22	1662	March 1, 2022
Spec file for yolo v3 not recognized TAO Toolkit	11	22	September 30, 2024
Error in Generating TFrecords for yolov4 TAO Toolkit	38	1224	May 17, 2022
Inference YOLO_v4 int8 mode doesn't show any bounding box TAO Toolkit	31	2540	November 12, 2021
TLT YOLOv4 (CSPDakrnet53) - TensorRT INT8 model gives wrong predictions (0 mAP) TAO Toolkit yolo	35	3816	December 6, 2021
Problem in tlt export TAO Toolkit	12	726	October 3, 2021
TLT YOLOv3 Int8 can not detect anything TAO Toolkit	17	1688	October 12, 2021

Unable to export QAT yolov3 in int8

and got the logs and errors:

=== TAO Toolkit TensorFlow ===

Related topics