TensorRT Python interface UFF int8 calibration issue

martin.stellmacher · February 1, 2018, 4:36pm

I just tried to adopt the example from https://devblogs.nvidia.com/int8-inference-autonomous-vehicles-tensorrt/ to a tensorflow squeezedet network. But I ran into several problems:

in /usr/local/lib/python2.7/dist-packages/tensorrt/utils/_utils.py:48 assert(parser.parse_from_file( uff_file, network, datatype)) will not work for datatype int8, because the data type for the source network has still to be float32. In the function for caffe this is implemented correctly.
in /usr/local/lib/python2.7/dist-packages/tensorrt/lite/engine.py:549ff self.data_type.input_type() yields 0.0 which isn’t very helpful in the trace. A reference to .dtype will solve this.
in /usr/local/lib/python2.7/dist-packages/tensorrt/utils/_utils.py:70 there is a typo builder.set_int8_Mode() instead of builder.set_int8_mode() which leads to an error if you execute this function.
But finally after solving these issues the script executes until:

[TensorRT] INFO: Calibrating with batch 17
[TensorRT] INFO: Calibrating with batch 18
[TensorRT] INFO: Calibrating with batch 19
[TensorRT] INFO: 
[TensorRT] INFO: --------------- Timing <reformat>(9)
[TensorRT] INFO: Tactic 0 time 0.08752
[TensorRT] INFO: 
[TensorRT] INFO: --------------- Timing conv1/bias_add + conv1/relu(3)
[TensorRT] INFO: Tactic 0 time 0.837312
[TensorRT] INFO: 
[TensorRT] INFO: --------------- Timing conv1/bias_add + conv1/relu(2)
[TensorRT] INFO: Tactic 1 time 1.8968
[TensorRT] INFO: Tactic 49 time 1.99677
[TensorRT] INFO: Tactic 128 time 1.98461
[TensorRT] INFO: 
[TensorRT] INFO: --------------- Timing conv1/bias_add + conv1/relu(14)
[TensorRT] INFO: Tactic 1363534230700867617 time 0.944896
[TensorRT] INFO: Tactic 1642270411037877776 time 0.933888
[TensorRT] INFO: Tactic 3146172331490511787 time 0.995136
[TensorRT] INFO: Tactic 3528302785056538033 time 0.879968
[TensorRT] INFO: Tactic 5443600094180187792 time 0.81712
[TensorRT] INFO: Tactic 5552354567368947361 time 0.780032
[TensorRT] INFO: Tactic 5824828673459742858 time 0.965824
[TensorRT] INFO: Tactic -6618588952828687390 time 0.857568
[TensorRT] INFO: Tactic -6362554771847758902 time 0.994464
[TensorRT] INFO: Tactic -2701242286872672544 time 0.990848
[TensorRT] INFO: Tactic -2535759802710599445 time 0.961216
[TensorRT] INFO: Tactic -675401754313066228 time 0.981312
[TensorRT] INFO: 
[TensorRT] INFO: --------------- Timing conv1/bias_add + conv1/relu(1)
[TensorRT] INFO: Tactic 0 time 2.36256
[TensorRT] INFO: Tactic 1 time 1.74506
[TensorRT] INFO: Tactic 2 time 2.5201
[TensorRT] INFO: --------------- Chose 14 (5552354567368947361)
[TensorRT] INFO: 
[TensorRT] INFO: --------------- Timing conv1/bias_add + conv1/relu(3)
[TensorRT] INFO: 
[TensorRT] INFO: --------------- Timing conv1/bias_add + conv1/relu(2)
[TensorRT] INFO: 
[TensorRT] INFO: --------------- Timing conv1/bias_add + conv1/relu(14)
[TensorRT] INFO: 
[TensorRT] INFO: --------------- Timing conv1/bias_add + conv1/relu(1)
[TensorRT] ERROR: Internal error: could not find any implementation for node conv1/bias_add + conv1/relu, try increasing the workspace size with IBuilder::setMaxWorkspaceSize()

I increased the MaxWorkspaceSize to 11GB without any change. I’m quite sure that memory limits can’t be the reason, because anything runs well for float32. Only the int8 calibration yields this error message, which I can’t debug further.

Any hints how to solve this problem ?

martin.stellmacher · February 6, 2018, 5:07pm

Uuups, just read the Release Notes for 3.0.2:

The TensorFlow export path is currently not expected to support the following:

<ul>
<li>Other versions of TensorFlow (0.9, 1.1, etc.)
RNNs
INT8 CNNs</li>
</ul>

vuiseng9 · February 6, 2018, 9:34pm

Thanks for putting these together Martin. We initially ran into the same issue when we were converting the Tensorflow-trained Resnet-152. With your comment on caffemodel, we at least managed to convert an MNIST FP32 caffemodel to TRT-INT8.

From the release notes, it seems that Tensorflow limitations only apply to the source network trained in INT8 but should work on FP32. By any chance, have tried your patch on python source on any FP32 TF-trained models?

martin.stellmacher · February 27, 2018, 4:15pm

Hi, I tried only FP32 TF-trained models. I think the Release note comment is a little bit unclear here, but as you can see from the changes I had to made nobody tried/tested the 8-bit TensorFlow path before releasing the software. FP32 optimization works fine and yields in my case a 50% performance gain over TF.

maximilian.fink · March 8, 2018, 1:18pm

Hello Martin,
I am also working on INT8 optimization for Tensorflow models and face similar issues. By creating a trt.lite.engine and passing INT8 as data_type together with an INT8 calibrator, I get the following error:

[TensorRT] ERROR: UFFParser: Parser error: conv1/1/convolution: Invalid weights types when converted

If I modify the source code in _utils.py from TensorRT to enable the calibration with FP32 datatype, batches were passed through the engine, but I get no performance improvements. So my question is, how can I understand the release notes? Is the INT8 optimization possible for FP32 Tensorflow models or not? How could you get the 50% performance improvements without INT8 calibration? How did you do the inference on the engine?

Thanks.
Max

martin.stellmacher · March 15, 2018, 9:26am

Hi,

because the bugs in the TF code path of the converter prevents any INT8 optimization I would interpret the Release Notes in a way, that it isn’t supported.

The 50% Performance gain in my case is for the FP32 conversion case compared to the TF inference done in Python on the PX2. To be precise, it isn’t exactly 50%, because I had to omit the last layer in the conversion because it isn’t supported by TRT.

Best regards,

Martin

anguliachao · March 27, 2018, 9:11am

Hi martin,
would you mind giving a rather detailed explanation , did you run the INT8 code with python interface successfully? I checked the documentation of tensorRT3.0.4,currently there’s no int8 callibrator transformation support, however the caffe tutorial [url]https://devblogs.nvidia.com/int8-inference-autonomous-vehicles-tensorrt[/url] seems ok. have you tried it?

martin.stellmacher · March 27, 2018, 10:59am

Hi,

AFAIK int8 calibration cache creation is supported since tensorRT 3.0 for caffe. Unfortunately not for UFF/tensorFlow, even in 3.0.4. And that is what I need.

Best regrads,

Martin

anguliachao · March 28, 2018, 1:47am

Hi Martin,
Thank you for ur notice. Yesterday NVIDIA said they’lll release TensorRT 4 to be integrated with TensorFlow, perhaps we can check the updates.

maximilian.fink · March 28, 2018, 4:52am

Hi,
did they also give a release date or a rough estimation for TensorRT 4. We really would need the INT8 support for TF models and have to decide now what to do…switching to Caffe oder continue with TF.

martin.stellmacher · March 28, 2018, 7:43am

Thank you,

on the TensorRT site you can find the statement “Members of the NVIDIA Developer Program can download the TensorRT 4 Release Candidate from here soon.”…

Best regards,

Martin

anguliachao · March 28, 2018, 7:48am

Hi Martin,
May I confirm with you that the caffe-support TensorRT, u ran it successfully on Python interface or C++ interface? I suppose the C++ should be OK, but never tried, gave a shot on Python interface which was unsuccessful. Did you succeed?

Regards,
Angulia

anguliachao · March 28, 2018, 7:50am

hi maximilian, I checked the download page today, it’s not ready to release yet, but shall be soon.

martin.stellmacher · March 28, 2018, 8:04am

Hi Angulia,

with caffe I only saw the demo/course at the GTC_EU. Like Joohoon Lee describes in his blog “https://devblogs.nvidia.com/int8-inference-autonomous-vehicles-tensorrt/” the trick is to create the calibration cache in Python on the host and use the output for INT8 calibration on the target.
If you wan’t to be sure, just try the example in Joohoons article.

Also very interesting: “https://devblogs.nvidia.com/tensorrt-integration-speeds-tensorflow-inference/”

Best regards,

Martin

anguliachao · March 28, 2018, 8:18am

many thanks. Cause I’ve investigated the Python API ,which said that uff_to_trt() didn’t support calibrator parameters. Hence I feel curious

SiddharthSharma_TPM · April 26, 2018, 11:39pm

We created a new “Deep Learning Training and Inference” section in Devtalk to improve the experience for deep learning and accelerated computing, and HPC users:
https://devtalk.nvidia.com/default/board/301/deep-learning-training-and-inference-/

We are moving active deep learning threads to the new section.

URLs for topics will not change with the re-categorization. So your bookmarks and links will continue to work as earlier.

-Siddharth

Topic		Replies	Views
TensorRT INT8 Calibration Issue TensorRT tensorrt , tensorflow	7	2209	May 7, 2021
TensorRT8 INT8 (signed char) I/O interface for ONNX model TensorRT tensorrt , onnx	4	1365	February 15, 2022
TensorRT 3: Faster TensorFlow Inference and Volta Support Technical Blog	16	462	December 8, 2020
TensorRT 5 Int8 Calibration Example TensorRT	11	7705	October 12, 2021
Calibration failed: INTERNAL: Failed to build TensorRT engine (INT8 precision mode) in Jetson Xavier NX (16GB) Jetson Xavier NX tensorrt	9	752	April 12, 2023
INT8 calibration file not generating, not building in INT8 mode TensorRT tensorrt , ubuntu , python , jetson-nano	15	2446	June 4, 2022
TensorRT Integration Speeds Up TensorFlow Inference Technical Blog	40	801	March 27, 2020
No speed up with TensorRT FP16 or INT8 on NVIDIA V100 TensorRT	7	2816	November 15, 2019
INT8 Calibration in Python with TensorRT 8.6 TensorRT tensorrt	5	3845	July 12, 2023
No SpeedUp after TensorRT INT8 (PointNet ++ tensorflow model) TensorRT	6	1253	February 25, 2020

TensorRT Python interface UFF int8 calibration issue

Related topics