Unable to deploy TAO 4.0.1 yolov4 model on deepstream6.0

• Hardware Platform: GPU
• DeepStream Version: 6.0
• TensorRT Version: 8.0.1-1+cuda11.3
• NVIDIA GPU Driver Version: 470.57.02
• Issue Type: bugs

I have trained a yolo_v4 model on TAO toolkit 4.0.1 with quantization set to true and exported the model to int8 and tried deploying the model in my python application which runs on deepstream 6.0. I am getting the following error on the console.

parseModel: Failed to parse ONNX model ERROR: tlt/tlt_decode.cpp:389 Failed to build network, error in model parsing.

following is the command I used for exporting the model

tao yolo_v4 export -m /workspace/tao-experiments/output/v4/weights/yolov4_resnet18_epoch_080.tlt -o /workspace/tao-experiments/output/etlt/d17_jul24_yolov4_resnet18_epoch_080_v4.1.etlt -k nvidia_tlt --data_type int8 -e  /workspace/tao-experiments/specs/d17-jul24-yolov4-960-544-config_v4.1.txt --cal_cache_file /workspace/tao-experiments/output/etlt/d17_jul24_yolov4_resnet18_epoch_080_v4.1.bin

I am also attaching the nvinfer config
d15_yolov3_resnet18_epoch_070.conf (4.3 KB)
, training config
d17-jul24-yolov4-960-544-config_v4.1.txt (2.5 KB)
and the full console log containing the error
log20230803.txt (4.3 KB)
.

Could you open more log by referring the link below: https://forums.developer.nvidia.com/t/deepstream-sdk-faq/80236/33?

Following log is debug level
log20230804.txt (121.9 KB)

the infer-dims value I had given in the nvinfer was wrong I have updated this, this is the new nvinfer config
d15_yolov3_resnet18_epoch_070.conf (4.4 KB). The following error

still persists, this is the debug log for the updated config
20230804.log (122.2 KB)

@adithya.ajith
Could you double check the .etlt model?
You export it and its name is d17_jul24_yolov4_resnet18_epoch_080_v4.1.etlt.
Could you set it to the ds config as well?
Currently, it is

tlt-encoded-model=/opt/vast/platform/nvast/ds_vast_pipeline/d17_july24_yolov4_resnet18_epoch_080.etlt

The above is the file name for the model, in the logs

you can see ds loading the model layers. I had exported the model twice thinking it might have been some issue with the export command not running properly, I have shared the first export command and the model name is different the second time I ran export, sorry for the confusion caused.

Please check the input width and height again.
In your training spec, it is
output_width: 960
output_height: 544

But in ds config, it is not matching.
infer-dims=3;576;1024

More, to check if .etlt model and its key are correct, suggest you to run below experiment to narrow down.
After exporting, please check if tao-deploy yolo_v4 gen_trt_engine and tao-deploy yolo_v4 inference can work. Refer to GPU-optimized AI, Machine Learning, & HPC Software | NVIDIA NGC.

I have fixed the infer-dims and have mentioned this in a previous reply. Please check the log shared here.

OK, to narrow down, please run below experiment to check if .etlt model and its key are correct.
After exporting, please check if tao-deploy yolo_v4 gen_trt_engine and tao-deploy yolo_v4 inference can work. Refer to GPU-optimized AI, Machine Learning, & HPC Software | NVIDIA NGC.

Sure, let me get back to you with that.

When I run the following command Iam getting the following error in the log
20230804.log (6.8 KB)

tao-deploy yolo_v4 gen_trt_engine -m /workspace/tao-experiments/output/etlt/d17_july24_yolov4_resnet18_epoch_080.etlt -k nvidia_tlt -e /workspace/tao-experiments/specs/d17-jul24-yolov4-960-544-config_v4.1.txt --data_type int8 --cal_cache_file /workspace/tao-experiments/output/etlt/d17_july24_yolov4_resnet18_epoch_080.bin --engine_file /workspace/tao-experiments/output/etlt/d17_july24_yolov4_resnet18_epoch_080.engine.int8

Moving to TAO forum for tracking.

Could you please follow GPU-optimized AI, Machine Learning, & HPC Software | NVIDIA NGC to run?

!tao-deploy yolo_v4 gen_trt_engine -m $USER_EXPERIMENT_DIR/export_qat/yolov4_resnet18_epoch_$EPOCH.etlt \
                                   -k $KEY \
                                   -e $SPECS_DIR/yolo_v4_retrain_resnet18_kitti_qat.txt \
                                   --data_type int8 \
                                   --batch_size 8 \
                                   --min_batch_size 1 \
                                   --opt_batch_size 8 \
                                   --max_batch_size 16 \
                                   --cal_json_file $USER_EXPERIMENT_DIR/export_qat/cal.json \
                                   --engine_file $USER_EXPERIMENT_DIR/export_qat/trt.engine.int8

I have the int8 calibration file generated, this is why I ran the below command. Tell me if the command is wrong.

I will check if I can reproduce the issue with the notebook GPU-optimized AI, Machine Learning, & HPC Software | NVIDIA NGC.
If possible, please check if you can run this notebook successfully as well.

Hi,
The --cal_json_file is needed.
See the end of YOLOv4 - NVIDIA Docs.

QAT Export Mode Required Arguments

  • --cal_json_file: The path to the json file containing tensor scale for QAT models. This argument is required if engine for QAT model is being generated.

Note

When exporting a model trained with QAT enabled, the tensor scale factors to calibrate the activations are peeled out of the model and serialized to a JSON file defined by the cal_json_file argument.

Now the command runs without any error and the int8 trt engine is created. I ran tao yolo_v4 inference with the engine file and it runs and I am getting the inference on the frames. I guess the issue is not with the etlt file and the key is correct.

Thanks for the info. Glad to know it is working now in tao.
For deepstream, please config eltt and key in https://github.com/NVIDIA-AI-IOT/deepstream_tao_apps/blob/release/tao4.0_ds6.2ga/configs/yolov4_tao/pgie_yolov4_tao_config.txt#L31-L33 and run with GitHub - NVIDIA-AI-IOT/deepstream_tao_apps at release/tao4.0_ds6.2ga

I have already configured the etlt file and key in the nvinfer config.Please refer the nvinfer config I have shared.

ok, to narrow the issue, could you please download the official yolov4 model and then config it and check if it can run successfully?
Download it from: https://github.com/NVIDIA-AI-IOT/deepstream_tao_apps/blob/release/tao4.0_ds6.2ga/download_models.sh#L44
Config file: https://github.com/NVIDIA-AI-IOT/deepstream_tao_apps/blob/release/tao4.0_ds6.2ga/configs/yolov4_tao/pgie_yolov4_tao_config.txt

Please run with GitHub - NVIDIA-AI-IOT/deepstream_tao_apps at release/tao4.0_ds6.2ga