Poor Result After INT8 Optimization (TLT Getting Started Guide)

a428tm · November 9, 2020, 8:57am

Well… I had docker inside of docker, so I think somewhere along the way while creating these files… KEY went wrong. So I am restarting everything. Please stay with me :D Will get back to you with answer tomorrow with response about FP16 version

a428tm · November 10, 2020, 9:12am

@Morganh

Hope you had a good day.
It seems like this isn’t a KEY issue. I went through all steps and made sure everything was correctly inputed; however, I am still running into the same issue as above. Meaning - I still have this error -

[ERROR] UffParser: Unsupported number of graph 0
[ERROR] Failed to parse the model, please check the encoding key to make sure it’s correct
[ERROR] Network must have at least one output
[ERROR] Network validation failed.
[ERROR] Unable to create engine
Segmentation fault (core dumped)

I am saying that it isn’t the key error because when I run the code from the notebook as is, it works fine. Please let me know if you have any pointer please.

!tlt-converter $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt
-k $KEY
-c $USER_EXPERIMENT_DIR/experiment_dir_final/calibration.bin
-o output_cov/Sigmoid,output_bbox/BiasAdd
-d 3,384,1248
-i nchw
-m 64
-t int8
-e $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.trt
-b 4

Morganh · November 10, 2020, 9:21am

Please also make sure your etlt file is generated via the same ngc key.
And make sure your etlt file is the correct one when you input it in the tlt-converter command.

a428tm · November 10, 2020, 9:37am

@Morganh

I think I am using the same key generated. I did

!echo $KEY

and it outputted the correct key that I used throughout the entire thing. Then I rant this -

!rm -rf $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt
!rm -rf $USER_EXPERIMENT_DIR/experiment_dir_final/calibration.bin
!tlt-export detectnet_v2
-m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/resnet18_detector_pruned.tlt
-o $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt
-k $KEY
–data_type fp16
–verbose

Then I ran this -

!tlt-converter $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt
-k $KEY
-o output_cov/Sigmoid,output_bbox/BiasAdd
-d 3,384,1248
-i nchw
-t fp16
-e $USER_EXPERIMENT_DIR/experiment_dir_final/fp16_resnet18_detector.trt \

As you can see, I delete the old .etlt before running the tlt-export. Then, I use the same .etlt file when I run tlt-converter

Am I missing any arguments? It works for int8 version, but it doesn’t work when I run as shown above…

p.s. it seems like back slashes automatically get hidden on the NVIDIA forum

Morganh · November 10, 2020, 9:41am

What do you mean by “It works for int8 version” ?
BTW, please note that all the etlt model is fp32 mode no matter which “data_type” you set in the tlt-export command.

a428tm · November 10, 2020, 9:49am

In the same session, if I run the following commands, it works -

Create calibration.tensor file (which I didn’t run when I was creating FP16 version)

!tlt-int8-tensorfile detectnet_v2 -e $SPECS_DIR/detectnet_v2_retrain_resnet18_kitti.txt
-m 10
-o $USER_EXPERIMENT_DIR/experiment_dir_final/calibration.tensor

Run tlt-export

!rm -rf $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt
!rm -rf $USER_EXPERIMENT_DIR/experiment_dir_final/calibration.bin
!tlt-export detectnet_v2
-m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/resnet18_detector_pruned.tlt
-o $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt
-k $KEY
–cal_data_file $USER_EXPERIMENT_DIR/experiment_dir_final/calibration.tensor
–data_type int8
–batches 10
–batch_size 4
–max_batch_size 4
–engine_file $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.trt.int8
–cal_cache_file $USER_EXPERIMENT_DIR/experiment_dir_final/calibration.bin
–verbose

Run tlt-converter

!tlt-converter $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt
-k $KEY
-c $USER_EXPERIMENT_DIR/experiment_dir_final/calibration.bin
-o output_cov/Sigmoid,output_bbox/BiasAdd
-d 3,384,1248
-i nchw
-m 64
-t int8
-e $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.trt
-b 4

As you can see, I am using the same $KEY env variable, but when I run this, it works without throwing the key error.
So i feel like the error wasn’t related to the key.

Curious what you think!

Morganh · November 10, 2020, 9:54am

So, do you mean you can generate int8 trt engine (resnet18_detector.trt) successfully via above three steps?

a428tm · November 10, 2020, 10:10am

@Morganh

That’s correct. I can successfully create resnet18_detector.trt via above steps; however, as original post states, the issue is that inference using this TRT engine isn’t good at all - https://1drv.ms/u/s!AjcYy-uvHk09j8ZNoRnhO9iynUP78g?e=z3uxHs

And this is why we decided to create FP16 model and check

Morganh · November 10, 2020, 10:15am

It is very curious why you can generate int8 trt engine but cannot generate fp16 trt engine.
Suggest you to double check again. You can also try to generate fp32 trt engine to see if it can work.

a428tm · November 10, 2020, 11:38am

@Morganh

It was a stupid mistake on my part. When I copied and pasted the code, there was a trailing blank space that caused the issue. When I deleted them and made it a one line command, it seems to be working.

Going back to the OG problem, even with FP16 model, the inference accuracy still seems to be terrible (see below)

Do you have any other pointer on what I can do to improve the inference result?

Thanks,
Jae

Morganh · November 11, 2020, 5:01am

Can you double check if the output images under below folder are really generated under fp16 mode?
-o $USER_EXPERIMENT_DIR/etlt_infer_testing

a428tm · November 11, 2020, 6:31am

@Morganh
Thanks for that pointer. I realized that I changed the name when I was testing fp32, 16 and int 8, but when I visualized it, i didn’t use the correct naming.

Now it looks good. Thank you so much

Topic		Replies	Views
Error in export a DetectNet_v2 model in INT8 mode TAO Toolkit	9	1153	October 12, 2021
Cannot convert FasterRCNN TLT model to trt engine TAO Toolkit	9	1232	October 12, 2021
Unable to generate tensorrt engine using ds-tao-detection app for yolov4_tiny for QAT trained etlt model DeepStream SDK	16	751	June 14, 2023
Convert tensorrt engine from version 7 to 8 TAO Toolkit tensorrt	67	5184	October 12, 2021
TX2 "INT8 not supported by platform. Trying FP16 mode" TAO Toolkit	11	2902	October 12, 2021
TLT YOLOv3 Int8 can not detect anything TAO Toolkit	17	1879	October 12, 2021
Errors: tlt-export TLT YOLO model to INT8 calibration TAO Toolkit tensorrt , yolo	8	1248	October 12, 2021
The tlt-converter does not work well with TensorRT 6 (Jetson TX2) TAO Toolkit	7	873	October 12, 2021
TAO toolkit fails to convert RetinaNet INT8 etlt model to INT8 CUDA engine (calibration cache needs to be deleted?) TAO Toolkit tensorrt , cuda	4	564	June 10, 2022
TLT YOLOv4 (CSPDakrnet53) - TensorRT INT8 model gives wrong predictions (0 mAP) TAO Toolkit yolo	35	4220	December 6, 2021

Poor Result After INT8 Optimization (TLT Getting Started Guide)

Related topics