Error while converting .etlt model to .trt model

user129289 · July 4, 2023, 10:36am

I getting following while converting my quantized retrained model using tao-toolkit

 
[07/04/2023-10:22:04] [TRT] [V] *************** Autotuning format combination: Int8(958464,1872,78,1) -> Int8(958464,1872,78,1) ***************
[07/04/2023-10:22:04] [TRT] [V] Deleting timing cache: 2014 entries, served 3234 hits since creation.
[07/04/2023-10:22:04] [TRT] [E] 2: [weightConvertors.cpp::quantizeBiasCommon::337] Error Code 2: Internal Error (Assertion getter(i) != 0 failed. )
Traceback (most recent call last):
  File "</usr/local/lib/python3.8/dist-packages/nvidia_tao_deploy/cv/detectnet_v2/scripts/gen_trt_engine.py>", line 3, in <module>
  File "<frozen cv.detectnet_v2.scripts.gen_trt_engine>", line 202, in <module>
  File "<frozen cv.detectnet_v2.scripts.gen_trt_engine>", line 67, in main
  File "<frozen engine.builder>", line 196, in create_engine
AttributeError: __enter__
2023-07-04 10:22:04,990 [INFO] nvidia_tao_deploy.cv.common.entrypoint.entrypoint_proto: Sending telemetry data.
2023-07-04 10:22:05,085 [WARNING] nvidia_tao_deploy.cv.common.entrypoint.entrypoint_proto: Telemetry data couldn't be sent, but the command ran successfully.
2023-07-04 10:22:05,085 [WARNING] nvidia_tao_deploy.cv.common.entrypoint.entrypoint_proto: [Error]: <urlopen error [Errno -2] Name or service not known>
2023-07-04 10:22:05,086 [WARNING] nvidia_tao_deploy.cv.common.entrypoint.entrypoint_proto: Execution status: FAIL

• Hardware = T4 on Azure VM
• Network Type = Detectnet_v2
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
I am using docker with docker_tag = 4.0.0-deploy
• Training spec file(If have, please share here)
My spec file is as follow:
detectnet_v2_retrain_resnet18_coco_qat.txt (6.4 KB)

• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)
Used following commands to run

detectnet_v2 export \
                  -m rtvp/experiment_dir_retrain_qat/weights/resnet18_detector_pruned_qat.tlt \
                  -o rtvp/experiment_dir_final/resnet18_detector_qat.etlt \
                  -k tlt_encode  \
                  -e specs/detectnet_v2_retrain_resnet18_coco_qat.txt \
                  --cal_json_file rtvp/experiment_dir_final/calibration_qat.json \
                  --gen_ds_config \
                  --verbose

and then

detectnet_v2 gen_trt_engine \
                  -m rtvp/experiment_dir_final/resnet18_detector_qat.etlt \
                  -k tlt_encode  \
                  -e specs/detectnet_v2_retrain_resnet18_coco_qat.txt \
                  --data_type int8 \
                  --batch_size 32 \
                  --max_batch_size 32\
                  --engine_file rtvp/experiment_dir_final/resnet18_detector_qat.trt.int8 \
                  --cal_cache_file rtvp/experiment_dir_final/calibration_qat.bin \
                  --cal_json_file rtvp/experiment_dir_final/calibration_qat.json \
                  --verbose

user129289 · July 4, 2023, 10:38am

Even I am able to evaluate my quantized model before exporting

Morganh · July 5, 2023, 8:23am

Did you ever run the default notebook against public KITTI dataset mentioned it?
If not, please run it to check if there is the same error. Thanks.

user129289 · July 5, 2023, 1:23pm

I have tried kitti dataset and things were working initially. Also working with my custom dataset even training worked and also I was able to generate tensorrt engine file for the pruned retrained model with FP16 and it is working too. But while converting Quantized model it is throwing above error.

Morganh · July 5, 2023, 2:46pm

Does it mean the issue happens in default notebook?

user129289 · July 6, 2023, 4:17am

I am using default notebook, but just updated my dataset only and specs as needed

user129289 · July 6, 2023, 4:20am

Is there any debugging method or command that can be used for detailed debugging?

Morganh · July 6, 2023, 4:58am

I will run the default notebook with KITTI dataset to check if it is reproduced.

Morganh · July 6, 2023, 7:18am

user129289:

[07/04/2023-10:22:04] [TRT] [V] *************** Autotuning format combination: Int8(958464,1872,78,1) -> Int8(958464,1872,78,1) ***************
[07/04/2023-10:22:04] [TRT] [V] Deleting timing cache: 2014 entries, served 3234 hits since creation.
[07/04/2023-10:22:04] [TRT] [E] 2: [weightConvertors.cpp::quantizeBiasCommon::337] Error Code 2: Internal Error (Assertion getter(i) != 0 failed. )
Traceback (most recent call last):
  File "</usr/local/lib/python3.8/dist-packages/nvidia_tao_deploy/cv/detectnet_v2/scripts/gen_trt_engine.py>", line 3, in <module>
  File "<frozen cv.detectnet_v2.scripts.gen_trt_engine>", line 202, in <module>
  File "<frozen cv.detectnet_v2.scripts.gen_trt_engine>", line 67, in main
  File "<frozen engine.builder>", line 196, in create_engine
AttributeError: __enter__

Can you upload the full log as well? Please click

to upload. Thanks.

user129289 · July 6, 2023, 11:36am

@Morganh as requested by you please find detailed log attached
debug.logs (4.4 MB)

Morganh · July 6, 2023, 4:22pm

I cannot reproduce the error. My steps are as below.

My step:

Train a model with QAT enabled. Just run 5 epochs.
export the model
Run gen_trt_engine with tao-deploy docker

To narrow down, could you please try to run the notebook with the KITTI dataset?

user129289 · July 7, 2023, 4:27am

I will try it and let you know

user129289 · July 7, 2023, 8:54am

I have tried with Kitti dataset, it not producing above error for me too. What should be my next step to evaluate ?

Morganh · July 7, 2023, 9:09am

Please try to enable qat directly when you train an unpruned model. Then run “export” and “gen_trt_engine” again.

user129289 · July 7, 2023, 10:59am

I have tried it. And able to generate the quant model as you said.

user129289 · July 7, 2023, 11:24am

So is it the issue with quantization with pruned model

Morganh · July 7, 2023, 3:36pm

Suggest you to double check. Please try different pruning ratio as well.

system · July 24, 2023, 7:56am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Error while trying to read the TensorRT engine file generated by Tao toolkit TAO Toolkit tensorrt	12	1725	May 8, 2023
Inference fails on etlt model file TAO Toolkit	4	480	April 13, 2023
Failed to validate a retrained UNet tensorRT engine TAO Toolkit	12	557	June 1, 2022
I want to output quantized ONNX models with TAO Toolkit TAO Toolkit onnx , tao	15	476	May 28, 2024
Error in detectnet_v2 - 10. Model Export TAO Toolkit	4	547	October 12, 2021
Tlt 3.0 retrained vehicletypenet, classification net error when loaded pretrained model TAO Toolkit	4	400	October 12, 2021
How to load a etlt model in python script TAO Toolkit	17	3733	October 12, 2021
Nvidia TLT TAO Toolkit	15	1611	October 12, 2021
Errors while running inference on Peoplenet Model (no training) TAO Toolkit	3	448	December 28, 2021
Classification TF_2 QAT - Calibration files? TAO Toolkit	9	375	June 4, 2023

Error while converting .etlt model to .trt model

Related topics