TL;DR Here is my observation:
I attached the calibrated graph_def to a SavedModel that is converted by TrtGraphConverterV2. During the inference time, this(tensorflow/trt_engine_op.cc at master · tensorflow/tensorflow · GitHub) calibration branch is triggered again.
A bit more details: at our company, we use TensorCraft a lot and we are migrating our TRT related tolling for TF2. We do quite a lot customization of TRT tooling, mostly because of some dependency issue, we can not use TRTEngineOp in Python environments. So we develop major TRT steps(conversion, calibration and serialization) in C++. Here is what I’ve done:
- I take a converted model produced by TrtGraphConverterV2 under the precision mode of INT8 and load it into C++ using tensorflow::SavedModelBundle.
- Use the SavedModelBundle’s session to run this model with calibration data. I verified that TRTEngineOp is triggered and it goes into the aforementioned calibration branch.
2.1) However, this thread(tensorflow/trt_engine_op.cc at master · tensorflow/tensorflow · GitHub) is blocked by
trt_builder_->buildEngineWithConfig(*network(), *builder_config)here(tensorflow/convert_nodes.cc at master · tensorflow/tensorflow · GitHub)) even though the main thread finished without any error. The network name is:
TF:2.4.1, TRT:7.2.2-Precision:INT8, Calibration:1, Max-Batch-Size:1000, Max-Workspace-Size:1073741824
Here are some details of that I observed after the calibration step:
- The node whose op is TRTEngineOpTF2 has a attr of
segment_funcand its value is
TRTEngineOp_0_0_native_segment. But I failed to use ResourceMgr.LookUp to get a
TRTEngineOp_0_0_native_segmentas the container name. (I am running out of link quota, will provide the link to LookUp latter)
- There is no node of GetCalibrationDataOp in the graph.
Because there are lots of internal customization at our side, I don’t find a good way to share the script to reproduce this issue. A face to face meeting to discuss all these details is appreciated.
Nvidia Driver Version:
Operating System + Version:
Python Version (if applicable):
Python 2.7.12 and Python 3.7.10
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):
Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)
- Exact steps/commands to build your repro
- Exact steps/commands to run your repro
- Full traceback of errors encountered