Poor Result After INT8 Optimization (TLT Getting Started Guide)

Well… I had docker inside of docker, so I think somewhere along the way while creating these files… KEY went wrong. So I am restarting everything. Please stay with me :D Will get back to you with answer tomorrow with response about FP16 version

@Morganh

Hope you had a good day.
It seems like this isn’t a KEY issue. I went through all steps and made sure everything was correctly inputed; however, I am still running into the same issue as above. Meaning - I still have this error -

[ERROR] UffParser: Unsupported number of graph 0
[ERROR] Failed to parse the model, please check the encoding key to make sure it’s correct
[ERROR] Network must have at least one output
[ERROR] Network validation failed.
[ERROR] Unable to create engine
Segmentation fault (core dumped)

I am saying that it isn’t the key error because when I run the code from the notebook as is, it works fine. Please let me know if you have any pointer please.

!tlt-converter $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt
-k $KEY
-c $USER_EXPERIMENT_DIR/experiment_dir_final/calibration.bin
-o output_cov/Sigmoid,output_bbox/BiasAdd
-d 3,384,1248
-i nchw
-m 64
-t int8
-e $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.trt
-b 4

Please also make sure your etlt file is generated via the same ngc key.
And make sure your etlt file is the correct one when you input it in the tlt-converter command.

@Morganh

I think I am using the same key generated. I did

!echo $KEY

and it outputted the correct key that I used throughout the entire thing. Then I rant this -

!rm -rf $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt
!rm -rf $USER_EXPERIMENT_DIR/experiment_dir_final/calibration.bin
!tlt-export detectnet_v2
-m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/resnet18_detector_pruned.tlt
-o $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt
-k $KEY
–data_type fp16
–verbose

Then I ran this -

!tlt-converter $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt
-k $KEY
-o output_cov/Sigmoid,output_bbox/BiasAdd
-d 3,384,1248
-i nchw
-t fp16
-e $USER_EXPERIMENT_DIR/experiment_dir_final/fp16_resnet18_detector.trt \

As you can see, I delete the old .etlt before running the tlt-export. Then, I use the same .etlt file when I run tlt-converter

Am I missing any arguments? It works for int8 version, but it doesn’t work when I run as shown above…

p.s. it seems like back slashes automatically get hidden on the NVIDIA forum

What do you mean by “It works for int8 version” ?
BTW, please note that all the etlt model is fp32 mode no matter which “data_type” you set in the tlt-export command.

In the same session, if I run the following commands, it works -

  1. Create calibration.tensor file (which I didn’t run when I was creating FP16 version)

!tlt-int8-tensorfile detectnet_v2 -e $SPECS_DIR/detectnet_v2_retrain_resnet18_kitti.txt
-m 10
-o $USER_EXPERIMENT_DIR/experiment_dir_final/calibration.tensor

  1. Run tlt-export

!rm -rf $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt
!rm -rf $USER_EXPERIMENT_DIR/experiment_dir_final/calibration.bin
!tlt-export detectnet_v2
-m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/resnet18_detector_pruned.tlt
-o $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt
-k $KEY
–cal_data_file $USER_EXPERIMENT_DIR/experiment_dir_final/calibration.tensor
–data_type int8
–batches 10
–batch_size 4
–max_batch_size 4
–engine_file $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.trt.int8
–cal_cache_file $USER_EXPERIMENT_DIR/experiment_dir_final/calibration.bin
–verbose

  1. Run tlt-converter

!tlt-converter $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt
-k $KEY
-c $USER_EXPERIMENT_DIR/experiment_dir_final/calibration.bin
-o output_cov/Sigmoid,output_bbox/BiasAdd
-d 3,384,1248
-i nchw
-m 64
-t int8
-e $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.trt
-b 4

As you can see, I am using the same $KEY env variable, but when I run this, it works without throwing the key error.
So i feel like the error wasn’t related to the key.

Curious what you think!

So, do you mean you can generate int8 trt engine (resnet18_detector.trt) successfully via above three steps?

@Morganh

That’s correct. I can successfully create resnet18_detector.trt via above steps; however, as original post states, the issue is that inference using this TRT engine isn’t good at all - https://1drv.ms/u/s!AjcYy-uvHk09j8ZNoRnhO9iynUP78g?e=z3uxHs

And this is why we decided to create FP16 model and check

It is very curious why you can generate int8 trt engine but cannot generate fp16 trt engine.
Suggest you to double check again. You can also try to generate fp32 trt engine to see if it can work.

@Morganh

It was a stupid mistake on my part. When I copied and pasted the code, there was a trailing blank space that caused the issue. When I deleted them and made it a one line command, it seems to be working.

Going back to the OG problem, even with FP16 model, the inference accuracy still seems to be terrible (see below)

Do you have any other pointer on what I can do to improve the inference result?

Thanks,
Jae

Can you double check if the output images under below folder are really generated under fp16 mode?
-o $USER_EXPERIMENT_DIR/etlt_infer_testing

1 Like

@Morganh
Thanks for that pointer. I realized that I changed the name when I was testing fp32, 16 and int 8, but when I visualized it, i didn’t use the correct naming.

Now it looks good. Thank you so much