Convert TAO Yolov4 model to DLA engine fails

BarcaBear · January 20, 2022, 2:45pm

Please provide the following information when requesting support.

• Xavier NX
• YoloV4 Darknet19
• TAO 3.0
• Training spec file(If have, please share here)

I have trained a yoloV4, arch= darknet19 model using TAO (https://docs.nvidia.com/tao/tao-toolkit/text/object_detection/yolo_v4.html#exporting-the-model).

I can use the “tao-converter-jp46-trt8.0.1.6” to generate an engine that runs in deepstream-app 6.0. When I try and build an engine with the dla -u flag set to 0 or 1. The engine build fails.

Error

“Module_id 33 Severity 2 : NVMEDIA_DLA 2493
Module_id 33 Severity 2 : Runtime: loadBare failed. error: 0x000004
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1544, GPU 7169 (MiB)
[ERROR] 1: [nvdlaUtils.cpp::deserialize::164] Error Code 1: DLA (NvMediaDlaLoadLoadable : load loadable failed.)
[ERROR] Unable to create engine
Segmentation fault (core dumped)”

I have tried the suggestion in this link “Failed to create DLA engine from .etlt model - #11 by Morganh” And I can build the peoplesegnet_resnet50. engine successfully.

Why is my engine failing?
./tao-converter -o BatchedNMS -d 3,704,1280 -p Input,1x3x704x1280,1x3x704x1280,1x3x704x1280 -u 0 -t fp16 -w 6000000000 -k nnnn12345 yolov4_darknet19.etlt

I’m using jetpack 4.6 L4T 32.6.1 on Xavier NX

Morganh · January 21, 2022, 8:05am

To narrow down, can you download an official yolov4 model?

wget https://nvidia.box.com/shared/static/511552h6b1ecw4gd20ptuihoiidz13cs -O models.zip

See deepstream_tao_apps/pgie_yolov4_tao_config.txt at master · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub ,
it is a 960x544 model. And tlt-model-key=nvidia_tlt

BarcaBear · January 27, 2022, 9:51am

thanks for your fast response.

I managed to download and convert the official yoloV4 model and it works fine. With that in mind, I noticed that both the official yoloV4 model and the peoplesegnet_resnet50 I mentioned above both use int8 with calibration file. So I exported my custom trained yoloV4 model as int8 and then I managed to create the engine file.

I can run it in the deepstream-app but it doesn’t produce bounding boxes. For the GPU or DLA version. Is there a common mistake? I can use int8 models trained on detectNet with no issues?

Morganh · January 27, 2022, 11:07am

No, it should not have mistake.

Can you run GitHub - NVIDIA-AI-IOT/deepstream_tao_apps: Sample apps to demonstrate how to deploy models trained with TAO on DeepStream with the video file inside the deepstream successfully?

BarcaBear · February 1, 2022, 5:05pm

I’m currently training a new model and will export it as int8. It’s strange an fp16 model won’t build on dla settings and a int8 engine will.

BarcaBear · February 4, 2022, 2:28pm

I have retrained the model and it works perfectly with int32. However I have the same issue with int8, there are no bboxes being displayed. I see this post had the same issue

As you suggested I can run the app successfully wit the video file in the sample.

I export as below-

#Uncomment to export in INT8 mode (generate calibration cache file).
!tao yolo_v4 export -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/yolov4_darknet19_epoch_200.tlt
-o $USER_EXPERIMENT_DIR/export/yolov4_resnet18_epoch_$EPOCH.etlt
-e $SPECS_DIR/yolo_v4_retrain_resnet18_kitti.txt
-k $KEY
–cal_image_dir $USER_EXPERIMENT_DIR/data/testing/image_2
–data_type int8
–batch_size 16
–batches 100
–cal_cache_file $USER_EXPERIMENT_DIR/export/cal.bin
–cal_data_file $USER_EXPERIMENT_DIR/export/cal.tensorfile

And receive this message-

“The ONNX operator number change on the optimization: 483 → 232
2022-02-04 12:29:02,398 [INFO] keras2onnx: The ONNX operator number change on the optimization: 483 → 232
2022-02-04 12:29:03,855 [INFO] iva.common.export.base_exporter: Generating a tensorfile with random tensor images. This may work well as a profiling tool, however, it may result in inaccurate results at inference. Please generate a tensorfile using the tlt-int8-tensorfile, or provide a custom directory of images for best performance.”

I have images in the directory. I will continue testing, any suggestions would be appreciated. I have around 10k images and batches set to 100. Do you think this could be the issue? I will try with 1000.

BarcaBear · February 4, 2022, 3:10pm

looks like I had an issue with the below. I have resolved it and now I dont get the same message with “provide a custom directory of images for best performance.” i will provide feedback on Monday.

{
“Mounts”: [
{
“source”: “/home/ubuntu/cv_samples_v1.2.0/yolo_v4/LOCAL_PROJECT_DIR”,
“destination”: “/workspace/tao-experiments”
},
{
“source”: “/home/ubuntu/cv_samples_v1.2.0/yolo_v4/specs”,
“destination”: “/workspace/tao-experiments/yolo_v4/specs”
}
]
}

BarcaBear · February 4, 2022, 3:58pm

This thread also has the same symptoms I’m seeing.

Why do fp16/32 engines work fine and int8 does not?

BarcaBear · February 4, 2022, 3:59pm

Can I send you me .etlt file and see if you get the same results?

Morganh · February 4, 2022, 4:01pm

If there is the log of "iva.common.export.base_exporter: Generating a tensorfile with random tensor images. ” , then there is something wrong in the commands “cal_image_dir” , “batch_size” and “batches”.

If after int-8 calibration the accuracy of the int-8 inferences seem to degrade, it could be because that there wasn’t enough data in the calibration tensorfile used to calibrate the model or, the training data is not entirely representative of your test images, and the calibration maybe incorrect. Therefore, you may either regenerate the calibration tensorfile with more batches of the training data and recalibrate the model, or calibrate the model on a few images from the test set.

BarcaBear · February 4, 2022, 8:04pm

Thanks again for your fast response.

So far-

Nvidia AWS ami instance
Nvidia yoloV4 notebook
Nvidia Tao-converter fp16/32 works perfectly with great results
Nvidia tao-converter int8 completed but complains about wrong input values. Change to recommended generates the engine
engine/bin loads in Deepstream app but does not generate bboxes.

Conclusión/thought process

training is fine as fo16/32 engines are great
tao-convert is happy and produces int8 engine but Deepstream fails to like it.
same process with detectNet works perfectly. fp16/32/int8

From the threads I have found. There is some confusion on yolo int8, with none of the threads being closed with a solution.

After many experiments. The yoloV4 darknet53 models are by far the most accurate. But slower, hence the need for int8 optimization.

I’m happy to provide my model.

BarcaBear · February 7, 2022, 2:54pm

I’m not making any progress with int8. I am retraining with “enable_qat: true”. I’m hoping this is going to solve my problems.

Morganh · February 7, 2022, 4:41pm

Can you share the latest command and log when you run “tao yolo_v4 export” ?

BarcaBear · February 8, 2022, 9:01am

Please see below-

2022-02-07 10:31:57,210 [INFO] root: Registry: [‘nvcr.io’]
2022-02-07 10:31:57,286 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3
2022-02-07 10:31:57,294 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/home/ubuntu/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
Using TensorFlow backend.
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
2022-02-07 10:32:03,884 [INFO] root: Building exporter object.
2022-02-07 10:32:12,825 [INFO] root: Exporting the model.
2022-02-07 10:32:12,825 [INFO] root: Using input nodes: [‘Input’]
2022-02-07 10:32:12,825 [INFO] root: Using output nodes: [‘BatchedNMS’]
2022-02-07 10:32:12,825 [INFO] iva.common.export.keras_exporter: Using input nodes: [‘Input’]
2022-02-07 10:32:12,825 [INFO] iva.common.export.keras_exporter: Using output nodes: [‘BatchedNMS’]
The ONNX operator number change on the optimization: 483 → 232
2022-02-07 10:32:57,201 [INFO] keras2onnx: The ONNX operator number change on the optimization: 483 → 232
2022-02-07 10:32:58,620 [INFO] iva.common.export.base_exporter: Generating a tensorfile with random tensor images. This may work well as a profiling tool, however, it may result in inaccurate results at inference. Please generate a tensorfile using the tlt-int8-tensorfile, or provide a custom directory of images for best performance.
100%|█████████████████████████████████████| 1000/1000 [2:10:30<00:00, 7.83s/it]
2022-02-07 12:43:28,850 [INFO] iva.common.export.keras_exporter: Calibration takes time especially if number of batches is large.
2022-02-07 12:43:28,850 [INFO] root: Calibration takes time especially if number of batches is large.
2022-02-07 13:15:06,252 [INFO] iva.common.export.base_calibrator: Saving calibration cache (size 7084) to /workspace/tao-experiments/yolo_v4/export/cal.bin
2022-02-07 13:17:43,761 [INFO] root: Export complete.
2022-02-07 13:17:43,761 [INFO] root: {
“param_count”: 31.572441,
“size”: 121.07246398925781
}
2022-02-07 13:17:46,306 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Morganh · February 8, 2022, 9:04am

Can you share the full command?

BarcaBear · February 8, 2022, 9:05am

My images are 1280x720 and I have the output set to

output_width: 960
output_height: 544

Could this be causing issues? I see from this post Network Image Input Resizing - #6 by Morganh That the dataloader either pads with zeros, or crops to fit to the output resolution.

BarcaBear · February 8, 2022, 9:06am

!tao yolo_v4 export -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/yolov4_darknet19_epoch_200.tlt
-o $USER_EXPERIMENT_DIR/export/yolov4_resnet18_epoch_$EPOCH.etlt
-e $SPECS_DIR/yolo_v4_retrain_resnet18_kitti_seq.txt
-k $KEY
–cal_image_dir $USER_EXPERIMENT_DIR/data/testing/image_2
–data_type int8
–batch_size 16
–batches 1000
–cal_cache_file $USER_EXPERIMENT_DIR/export/cal.bin
–cal_data_file $USER_EXPERIMENT_DIR/export/cal.tensorfile

Morganh · February 8, 2022, 9:29am

Can you change to all of your training images?

BarcaBear · February 8, 2022, 9:31am

so instead of the testing folder try the training folder of images? I will give it a try.

Thanks!

Morganh · February 8, 2022, 9:32am

Yes, the cal.bin is generated by training dataset.

If after int-8 calibration the accuracy of the int-8 inferences seem to degrade, it could be because there wasn’t enough data in the calibration tensorfile used to calibrate thee model or, the training data is not entirely representative of your test images, and the calibration maybe incorrect. Therefore, you may either regenerate the calibration tensorfile with more batches of the training data and recalibrate the model, or calibrate the model on a few images from the test set.

Topic		Replies	Views
Unable to deploy TAO 4.0.1 yolov4 model on deepstream6.0 TAO Toolkit deepstream	43	1082	August 18, 2023
Unable to export QAT yolov3 in int8 TAO Toolkit	7	552	April 25, 2023
Inference YOLO_v4 int8 mode doesn't show any bounding box TAO Toolkit	31	2546	November 12, 2021
Failed to convert to tensorrt engine for yolov4 model trained in TAO TAO Toolkit jetson	5	141	July 3, 2024
TLT YOLOv4 (CSPDakrnet53) - TensorRT INT8 model gives wrong predictions (0 mAP) TAO Toolkit yolo	35	3827	December 6, 2021
Error while converting model using TAO TAO Toolkit	32	798	October 27, 2021
TLT YOLOv3 Int8 can not detect anything TAO Toolkit	17	1692	October 12, 2021
TAO converter - INT8 engine generated with YOLOV4(CSPDarknet53) gives wrong predictions(0 mAP) for models trained with fish-eye datasets TAO Toolkit	20	2039	December 22, 2021
Yolov3 worklfow or incorrect calibration file for int8 inference TAO Toolkit tensorrt , yolo , deepstream	6	528	July 6, 2023
Tao deploy error - TAO Toolkit jetson , deepstream	40	100	March 5, 2025

Convert TAO Yolov4 model to DLA engine fails

Related topics