TensorRT Python interface UFF int8 calibration issue

I just tried to adopt the example from https://devblogs.nvidia.com/int8-inference-autonomous-vehicles-tensorrt/ to a tensorflow squeezedet network. But I ran into several problems:

  • in /usr/local/lib/python2.7/dist-packages/tensorrt/utils/_utils.py:48 assert(parser.parse_from_file( uff_file, network, datatype)) will not work for datatype int8, because the data type for the source network has still to be float32. In the function for caffe this is implemented correctly.
  • in /usr/local/lib/python2.7/dist-packages/tensorrt/lite/engine.py:549ff self.data_type.input_type() yields 0.0 which isn’t very helpful in the trace. A reference to .dtype will solve this.
  • in /usr/local/lib/python2.7/dist-packages/tensorrt/utils/_utils.py:70 there is a typo builder.set_int8_Mode() instead of builder.set_int8_mode() which leads to an error if you execute this function.
    But finally after solving these issues the script executes until:
[TensorRT] INFO: Calibrating with batch 17
[TensorRT] INFO: Calibrating with batch 18
[TensorRT] INFO: Calibrating with batch 19
[TensorRT] INFO: 
[TensorRT] INFO: --------------- Timing <reformat>(9)
[TensorRT] INFO: Tactic 0 time 0.08752
[TensorRT] INFO: 
[TensorRT] INFO: --------------- Timing conv1/bias_add + conv1/relu(3)
[TensorRT] INFO: Tactic 0 time 0.837312
[TensorRT] INFO: 
[TensorRT] INFO: --------------- Timing conv1/bias_add + conv1/relu(2)
[TensorRT] INFO: Tactic 1 time 1.8968
[TensorRT] INFO: Tactic 49 time 1.99677
[TensorRT] INFO: Tactic 128 time 1.98461
[TensorRT] INFO: 
[TensorRT] INFO: --------------- Timing conv1/bias_add + conv1/relu(14)
[TensorRT] INFO: Tactic 1363534230700867617 time 0.944896
[TensorRT] INFO: Tactic 1642270411037877776 time 0.933888
[TensorRT] INFO: Tactic 3146172331490511787 time 0.995136
[TensorRT] INFO: Tactic 3528302785056538033 time 0.879968
[TensorRT] INFO: Tactic 5443600094180187792 time 0.81712
[TensorRT] INFO: Tactic 5552354567368947361 time 0.780032
[TensorRT] INFO: Tactic 5824828673459742858 time 0.965824
[TensorRT] INFO: Tactic -6618588952828687390 time 0.857568
[TensorRT] INFO: Tactic -6362554771847758902 time 0.994464
[TensorRT] INFO: Tactic -2701242286872672544 time 0.990848
[TensorRT] INFO: Tactic -2535759802710599445 time 0.961216
[TensorRT] INFO: Tactic -675401754313066228 time 0.981312
[TensorRT] INFO: 
[TensorRT] INFO: --------------- Timing conv1/bias_add + conv1/relu(1)
[TensorRT] INFO: Tactic 0 time 2.36256
[TensorRT] INFO: Tactic 1 time 1.74506
[TensorRT] INFO: Tactic 2 time 2.5201
[TensorRT] INFO: --------------- Chose 14 (5552354567368947361)
[TensorRT] INFO: 
[TensorRT] INFO: --------------- Timing conv1/bias_add + conv1/relu(3)
[TensorRT] INFO: 
[TensorRT] INFO: --------------- Timing conv1/bias_add + conv1/relu(2)
[TensorRT] INFO: 
[TensorRT] INFO: --------------- Timing conv1/bias_add + conv1/relu(14)
[TensorRT] INFO: 
[TensorRT] INFO: --------------- Timing conv1/bias_add + conv1/relu(1)
[TensorRT] ERROR: Internal error: could not find any implementation for node conv1/bias_add + conv1/relu, try increasing the workspace size with IBuilder::setMaxWorkspaceSize()

I increased the MaxWorkspaceSize to 11GB without any change. I’m quite sure that memory limits can’t be the reason, because anything runs well for float32. Only the int8 calibration yields this error message, which I can’t debug further.

Any hints how to solve this problem ?

Uuups, just read the Release Notes for 3.0.2:

The TensorFlow export path is currently not expected to support the following:

<ul>
<li>Other versions of TensorFlow (0.9, 1.1, etc.)
RNNs
INT8 CNNs</li>
</ul>

Thanks for putting these together Martin. We initially ran into the same issue when we were converting the Tensorflow-trained Resnet-152. With your comment on caffemodel, we at least managed to convert an MNIST FP32 caffemodel to TRT-INT8.

From the release notes, it seems that Tensorflow limitations only apply to the source network trained in INT8 but should work on FP32. By any chance, have tried your patch on python source on any FP32 TF-trained models?

Hi, I tried only FP32 TF-trained models. I think the Release note comment is a little bit unclear here, but as you can see from the changes I had to made nobody tried/tested the 8-bit TensorFlow path before releasing the software. FP32 optimization works fine and yields in my case a 50% performance gain over TF.

Hello Martin,
I am also working on INT8 optimization for Tensorflow models and face similar issues. By creating a trt.lite.engine and passing INT8 as data_type together with an INT8 calibrator, I get the following error:

[TensorRT] ERROR: UFFParser: Parser error: conv1/1/convolution: Invalid weights types when converted

If I modify the source code in _utils.py from TensorRT to enable the calibration with FP32 datatype, batches were passed through the engine, but I get no performance improvements. So my question is, how can I understand the release notes? Is the INT8 optimization possible for FP32 Tensorflow models or not? How could you get the 50% performance improvements without INT8 calibration? How did you do the inference on the engine?

Thanks.
Max

Hi,

because the bugs in the TF code path of the converter prevents any INT8 optimization I would interpret the Release Notes in a way, that it isn’t supported.

The 50% Performance gain in my case is for the FP32 conversion case compared to the TF inference done in Python on the PX2. To be precise, it isn’t exactly 50%, because I had to omit the last layer in the conversion because it isn’t supported by TRT.

Best regards,

Martin

Hi martin,
would you mind giving a rather detailed explanation , did you run the INT8 code with python interface successfully? I checked the documentation of tensorRT3.0.4,currently there’s no int8 callibrator transformation support, however the caffe tutorial https://devblogs.nvidia.com/int8-inference-autonomous-vehicles-tensorrt seems ok. have you tried it?

Hi,

AFAIK int8 calibration cache creation is supported since tensorRT 3.0 for caffe. Unfortunately not for UFF/tensorFlow, even in 3.0.4. And that is what I need.

Best regrads,

Martin

Hi Martin,
Thank you for ur notice. Yesterday NVIDIA said they’lll release TensorRT 4 to be integrated with TensorFlow, perhaps we can check the updates.

Hi,
did they also give a release date or a rough estimation for TensorRT 4. We really would need the INT8 support for TF models and have to decide now what to do…switching to Caffe oder continue with TF.

Thank you,

on the TensorRT site you can find the statement “Members of the NVIDIA Developer Program can download the TensorRT 4 Release Candidate from here soon.”…

Best regards,

Martin

Hi Martin,
May I confirm with you that the caffe-support TensorRT, u ran it successfully on Python interface or C++ interface? I suppose the C++ should be OK, but never tried, gave a shot on Python interface which was unsuccessful. Did you succeed?

Regards,
Angulia

hi maximilian, I checked the download page today, it’s not ready to release yet, but shall be soon.

Hi Angulia,

with caffe I only saw the demo/course at the GTC_EU. Like Joohoon Lee describes in his blog “https://devblogs.nvidia.com/int8-inference-autonomous-vehicles-tensorrt/” the trick is to create the calibration cache in Python on the host and use the output for INT8 calibration on the target.
If you wan’t to be sure, just try the example in Joohoons article.

Also very interesting: “https://devblogs.nvidia.com/tensorrt-integration-speeds-tensorflow-inference/

Best regards,

Martin

many thanks. Cause I’ve investigated the Python API ,which said that uff_to_trt() didn’t support calibrator parameters. Hence I feel curious

We created a new “Deep Learning Training and Inference” section in Devtalk to improve the experience for deep learning and accelerated computing, and HPC users:
https://devtalk.nvidia.com/default/board/301/deep-learning-training-and-inference-/

We are moving active deep learning threads to the new section.

URLs for topics will not change with the re-categorization. So your bookmarks and links will continue to work as earlier.

-Siddharth