I am running your image_classification example from your docker image nvcr.io/nvidia/tensorflow:19.11-tf2-py3 as follows
export CUDA_VISIBLE_DEVICES=“0”
python image_classification.py
–data_dir /mytf/imagenet
–input_saved_model_dir /mytf/1
–output_saved_model_dir /mytf/temp
–mode validation
–num_warmup_iterations 50
–use_trt
–optimize_offline
–precision INT8
–max_workspace_size $((2**32))
–batch_size 128
–target_duration 10
–calib_data_dir /mytf/imagenet
–num_calib_inputs 128
The tensorrt conversion completes successfully, but I see no speedup relative to FP32. Upon closer examination of the generated model, the graph nodes retain FP32 types, so the result is not surprising. Given that this is running on a Compute capability 6.1 (Quadro P6000 GPU), why did the converted model not use INT8 as requested above? How do I demonstrate the INT8 performance on this model that is described on your documentation?