TensorRT Sigmoid activation function produces slightly different result from original TensorFlow sigmoid op.

jia_pu · March 8, 2018, 5:27pm

I’m converting a TensorFlow graph to TensorRT engine. For the same input, TensorFlow graph and TensorRT engine produce identical result up to an tf.nn.sigmoid op. But the output of the sigmoid function differs slightly between TF graph and TensorRT engine. I wonder if this is to be expected.

For example, given an input:
[8.879764 -8.724520 -10.623482 -11.822342 -12.868923 -11.805139 -13.092369 -11.573037 -11.112819 -11.025951]

tf.nn.sigmoid() produces:
[0.99986076 0.00016252598 0.000024337136 0.0000073386827 0.0000025768695 0.0000074660811 0.000002060895 0.0000094165052 0.0000014919674 0.00001.6273782]

while TensorRT engine produces:
[0.999949 0.000442 0.000066 0.000020 0.000007 0.000020 0.000006 0.000026 0.000041 0.000044]

bhargavK · March 8, 2018, 6:11pm

I think this is normal (unless you have disabled reduced precision) as one of the optimizations that TensorRT does is to use ‘FP16 and INT8 reduced precision calibration’ and hence you see the truncated, less precise results with TensorRT than with TF which uses full-precision (FP32).

Can go through this: [url]https://devblogs.nvidia.com/tensorrt-3-faster-tensorflow-inference/[/url] to learn more about TRT.

jia_pu · March 8, 2018, 6:51pm

But in this case, I’m not using reduced precision. I’m creating tensorRT engine using the buildCudaEngine() function in C++ API. If I’m not mistaken, it creates TensorRT engine in 32bit precision.

bhargavK · March 8, 2018, 7:25pm

I see, in that case, it looks abnormal. Maybe you can query the builder to know for sure if the engine is created in FP32. Otherwise, I think NVidia person might have more information for you.

AastaLLL · March 12, 2018, 3:28am

Hi,

We have tested the tf.nn.sigmoid() op with TensorRT but cannot reproduce this issue.
Please remember to convert the input to np.float32 to reserve full precision.

data = data.astype(np.float32)

Here is our testing code for your reference:

from tensorrt.parsers import uffparser
import tensorflow as tf
import tensorrt as trt
import pycuda.driver as cuda
import numpy as np
import uff


MAX_WORKSPACE = 1 << 20
MAX_BATCHSIZE = 1
G_LOGGER = trt.infer.ConsoleLogger(trt.infer.LogSeverity.INFO)

inputs = tf.placeholder(dtype=tf.float32, shape=[1,10])
output = tf.nn.sigmoid(inputs, name='out')

data = np.expand_dims(np.array([8.879764, -8.724520, -10.623482, -11.822342, -12.868923, -11.805139, -13.092369, -11.573037, -11.112819, -11.025951]), axis=0)
data = data.astype(np.float32)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    tf_result = sess.run(output,feed_dict={inputs:data})

    graphdef = tf.get_default_graph().as_graph_def()
    frozen_graph = tf.graph_util.convert_variables_to_constants(sess, graphdef, ['out'])
    tf_model = tf.graph_util.remove_training_nodes(frozen_graph)

uff_model = uff.from_tensorflow(tf_model, ['out'])

parser = uffparser.create_uff_parser()
parser.register_input("Placeholder", (1,10,1), 0)
parser.register_output("out")

engine = trt.utils.uff_to_trt_engine(G_LOGGER, uff_model, parser, MAX_BATCHSIZE, MAX_WORKSPACE)
parser.destroy()

runtime = trt.infer.create_infer_runtime(G_LOGGER)
context = engine.create_execution_context()

trt_result = cuda.pagelocked_empty(10, dtype=np.float32)

d_input = cuda.mem_alloc(10 * data.dtype.itemsize)
d_output = cuda.mem_alloc(10 * trt_result.dtype.itemsize)

bindings = [int(d_input), int(d_output)]
stream = cuda.Stream()

cuda.memcpy_htod_async(d_input, data, stream)
context.enqueue(1, bindings, stream.handle, None)
cuda.memcpy_dtoh_async(trt_result, d_output, stream)

print('TensorFlow:')
print(tf_result)
print('\nTensorRT:')
print(trt_result)

Thanks.

jia_pu · March 14, 2018, 5:03pm

Thank you for looking into this and providing the sample code. I can confirm that I get identical result in this isolated test case you provided. However, when the sigmoid layer is part of a larger network, I still see the difference between TensorFlow and TensorRT, even though the output from previous conv layer is identical between TensorFlow and TensorRT. Let me see if I can extract a test case that exhibit the problem.

AastaLLL · March 16, 2018, 7:31am

Hi,

Have you tested our latest TensorRT package?
If not, could you try if this issue also occurs on TensorRT 3.0.4?

Thanks.

jia_pu · March 19, 2018, 4:05pm

I’m seeing this on 3.0.4.

AastaLLL · March 22, 2018, 7:09am

Hi,

It really help if you can extract a simple test case to reproduce the issue you met.
Will wait for your update and discuss with our internal team for further suggestion.

Thanks.

Topic		Replies	Views
TensorRT UFF parser register_input() cannot handle original graph in NHWC format TensorRT	7	2418	March 6, 2019
TensorRT3.0 results are different from Tensorflow (with a minimal example code) TensorRT	1	1039	August 13, 2018
Tensorflow 1.7 with TensorRT fails Jetson TX2	13	3821	October 18, 2021
TF-TRT not generating .engine file TensorRT	1	719	May 18, 2022
TensorRT (TF-TRT) doesn't improve TF model in GeForce 1060? TensorRT	7	2893	January 18, 2019
Different outputs between the output of the tensorflow and the output of ternsorrt. TensorRT	0	368	August 27, 2019
No SpeedUp after TensorRT INT8 (PointNet ++ tensorflow model) TensorRT	6	1247	February 25, 2020
Tensorflow RNN UFF conversion not yet supported? TensorRT	8	1265	October 12, 2021
No Speedup after Tensorrt6 INT8 TensorRT	1	569	April 1, 2020
Lower performance with TRT than plain TF? Jetson Xavier NX tensorrt , jetson-inference	14	1945	October 18, 2021

TensorRT Sigmoid activation function produces slightly different result from original TensorFlow sigmoid op.

Related topics