Error while converting .etlt model to .trt model

I getting following while converting my quantized retrained model using tao-toolkit

 
[07/04/2023-10:22:04] [TRT] [V] *************** Autotuning format combination: Int8(958464,1872,78,1) -> Int8(958464,1872,78,1) ***************
[07/04/2023-10:22:04] [TRT] [V] Deleting timing cache: 2014 entries, served 3234 hits since creation.
[07/04/2023-10:22:04] [TRT] [E] 2: [weightConvertors.cpp::quantizeBiasCommon::337] Error Code 2: Internal Error (Assertion getter(i) != 0 failed. )
Traceback (most recent call last):
  File "</usr/local/lib/python3.8/dist-packages/nvidia_tao_deploy/cv/detectnet_v2/scripts/gen_trt_engine.py>", line 3, in <module>
  File "<frozen cv.detectnet_v2.scripts.gen_trt_engine>", line 202, in <module>
  File "<frozen cv.detectnet_v2.scripts.gen_trt_engine>", line 67, in main
  File "<frozen engine.builder>", line 196, in create_engine
AttributeError: __enter__
2023-07-04 10:22:04,990 [INFO] nvidia_tao_deploy.cv.common.entrypoint.entrypoint_proto: Sending telemetry data.
2023-07-04 10:22:05,085 [WARNING] nvidia_tao_deploy.cv.common.entrypoint.entrypoint_proto: Telemetry data couldn't be sent, but the command ran successfully.
2023-07-04 10:22:05,085 [WARNING] nvidia_tao_deploy.cv.common.entrypoint.entrypoint_proto: [Error]: <urlopen error [Errno -2] Name or service not known>
2023-07-04 10:22:05,086 [WARNING] nvidia_tao_deploy.cv.common.entrypoint.entrypoint_proto: Execution status: FAIL

• Hardware = T4 on Azure VM
• Network Type = Detectnet_v2
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
I am using docker with docker_tag = 4.0.0-deploy
• Training spec file(If have, please share here)
My spec file is as follow:
detectnet_v2_retrain_resnet18_coco_qat.txt (6.4 KB)

• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)
Used following commands to run

detectnet_v2 export \
                  -m rtvp/experiment_dir_retrain_qat/weights/resnet18_detector_pruned_qat.tlt \
                  -o rtvp/experiment_dir_final/resnet18_detector_qat.etlt \
                  -k tlt_encode  \
                  -e specs/detectnet_v2_retrain_resnet18_coco_qat.txt \
                  --cal_json_file rtvp/experiment_dir_final/calibration_qat.json \
                  --gen_ds_config \
                  --verbose

and then

detectnet_v2 gen_trt_engine \
                  -m rtvp/experiment_dir_final/resnet18_detector_qat.etlt \
                  -k tlt_encode  \
                  -e specs/detectnet_v2_retrain_resnet18_coco_qat.txt \
                  --data_type int8 \
                  --batch_size 32 \
                  --max_batch_size 32\
                  --engine_file rtvp/experiment_dir_final/resnet18_detector_qat.trt.int8 \
                  --cal_cache_file rtvp/experiment_dir_final/calibration_qat.bin \
                  --cal_json_file rtvp/experiment_dir_final/calibration_qat.json \
                  --verbose

Even I am able to evaluate my quantized model before exporting

Did you ever run the default notebook against public KITTI dataset mentioned it?
If not, please run it to check if there is the same error. Thanks.

I have tried kitti dataset and things were working initially. Also working with my custom dataset even training worked and also I was able to generate tensorrt engine file for the pruned retrained model with FP16 and it is working too. But while converting Quantized model it is throwing above error.

Does it mean the issue happens in default notebook?

I am using default notebook, but just updated my dataset only and specs as needed

Is there any debugging method or command that can be used for detailed debugging?

I will run the default notebook with KITTI dataset to check if it is reproduced.

Can you upload the full log as well? Please click
image
to upload. Thanks.

@Morganh as requested by you please find detailed log attached
debug.logs (4.4 MB)

I cannot reproduce the error. My steps are as below.

My step:

  1. Train a model with QAT enabled. Just run 5 epochs.
  2. export the model
  3. Run gen_trt_engine with tao-deploy docker

To narrow down, could you please try to run the notebook with the KITTI dataset?

I will try it and let you know

I have tried with Kitti dataset, it not producing above error for me too. What should be my next step to evaluate ?

Please try to enable qat directly when you train an unpruned model. Then run “export” and “gen_trt_engine” again.

I have tried it. And able to generate the quant model as you said.

So is it the issue with quantization with pruned model

Suggest you to double check. Please try different pruning ratio as well.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.