TRT Calibration seems to be a NOOP on TF2 model

Description

Hi,

TL;DR Here is my observation:
I attached the calibrated graph_def to a SavedModel that is converted by TrtGraphConverterV2. During the inference time, this(tensorflow/trt_engine_op.cc at master · tensorflow/tensorflow · GitHub) calibration branch is triggered again.

A bit more details: at our company, we use TensorCraft a lot and we are migrating our TRT related tolling for TF2. We do quite a lot customization of TRT tooling, mostly because of some dependency issue, we can not use TRTEngineOp in Python environments. So we develop major TRT steps(conversion, calibration and serialization) in C++. Here is what I’ve done:

  1. I take a converted model produced by TrtGraphConverterV2 under the precision mode of INT8 and load it into C++ using tensorflow::SavedModelBundle.
  2. Use the SavedModelBundle’s session to run this model with calibration data. I verified that TRTEngineOp is triggered and it goes into the aforementioned calibration branch.
    2.1) However, this thread(tensorflow/trt_engine_op.cc at master · tensorflow/tensorflow · GitHub) is blocked by trt_builder_->buildEngineWithConfig(*network(), *builder_config) here(tensorflow/convert_nodes.cc at master · tensorflow/tensorflow · GitHub)) even though the main thread finished without any error. The network name is: TF:2.4.1, TRT:7.2.2-Precision:INT8, Calibration:1, Max-Batch-Size:1000, Max-Workspace-Size:1073741824

Here are some details of that I observed after the calibration step:

  1. The node whose op is TRTEngineOpTF2 has a attr of segment_func and its value is TRTEngineOp_0_0_native_segment. But I failed to use ResourceMgr.LookUp to get a TRTCalibrationResource given TRTEngineOp_0_0_native_segment as the container name. (I am running out of link quota, will provide the link to LookUp latter)
  2. There is no node of GetCalibrationDataOp in the graph.

Because there are lots of internal customization at our side, I don’t find a good way to share the script to reproduce this issue. A face to face meeting to discuss all these details is appreciated.

Thanks,
Muyang

Environment

TensorRT Version:
TRT:7.2.2
GPU Type:
GeForce 2070
Nvidia Driver Version:
465.19.01
CUDA Version:
11.3
CUDNN Version:
11.0
Operating System + Version:
Ubuntu 16.04
Python Version (if applicable):
Python 2.7.12 and Python 3.7.10
TensorFlow Version (if applicable):
2.4.1
PyTorch Version (if applicable):
N/A
Baremetal or Container (if container which image + tag):
N/A

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered

Here is the link to ResourceMgr.LookUp: tensorflow/resource_mgr.h at master · tensorflow/tensorflow · GitHub

Hi, Please refer to the below links to perform inference in INT8
https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleINT8/README.md

Thanks!

Hi,

Thanks for the reply but the link shows a 404 page. Could you give me an updated link?

Thanks!

Please refer,