TensorRT model cannot be built when at some layer's of the model, the output of Conv2D's height and/or width is odd

Nejla · March 23, 2018, 3:58am

Hi Nvidia,

There is a bug in TensorRT. When at some stage in a layer of the model, the output of conv2d’s height or width becomes odd and not even, the last dense layer causes TensorRT to crash at the build stage when I run trt.utils.uff_to_trt_engine.

Here is the code:

config = {
    'x_width': 256,
    'x_height': 40,
    'num_channels': 1,
    'num_classes': 3,
}

def D(x, is_training = False, reuse = True):
    with tf.variable_scope('Disc', reuse = reuse) as scope:
        conv1 = tf.layers.conv2d(x, 32, [5, 5],
                                 strides = [2, 2],
                                 padding = 'same', 
                                 data_format = 'channels_first')
        lrelu1 = tf.maximum(0.2 * conv1, conv1) # <= WON'T CRASH if you use this as output when building TensorRT

        # layer2
        conv2 = tf.layers.conv2d(lrelu1, 64, [3, 3],
                                 strides = [2, 2],
                                 padding = 'same',
                                 data_format = 'channels_first')
        batch_norm2 = tf.layers.batch_normalization(conv2, training = is_training, axis = 1)
        lrelu2 = tf.maximum(0.2 * batch_norm2, batch_norm2) # <= WON'T CRASH if you use this as output when building TensorRT

        # layer3
        conv3 = tf.layers.conv2d(lrelu2, 128, [2, 2],
                                 strides = [2, 2],
                                 padding = 'same',
                                 data_format = 'channels_first')
        batch_norm3 = tf.layers.batch_normalization(conv3, training = is_training, axis = 1)
        lrelu3 = tf.maximum(0.2 * batch_norm3, batch_norm3) # <= WON'T CRASH if you use this as output when building TensorRT

        # layer4
        conv4 = tf.layers.conv2d(lrelu3, 256, [2, 2],
                                 strides = [2, 2],
                                 padding = 'same',
                                 data_format = 'channels_first')
        lrelu4 = tf.maximum(0.2 * conv4, conv4) # <= WON'T CRASH if you use this as output when building TensorRT

        # layer 5
        flatten_length = lrelu4.get_shape().as_list()[1] * \
                         lrelu4.get_shape().as_list()[2] * lrelu4.get_shape().as_list()[3]
        flatten5 = tf.reshape(lrelu4, (-1, flatten_length)) # <= WON'T CRASH if you use this as output when building TensorRT
        fc5 = tf.layers.dense(flatten5, config['num_classes'])
        output = tf.nn.softmax(fc5) # <= CRASHES if you use this as output when building TensorRT

        assert output.get_shape()[1:] == [config['num_classes']]

        return output

Please pay attention to comments in the code.

If you change config[‘x_height’] to 48. Then, it won’t crash and can build the model!

When config[‘x_height’]=40, the output of lrelu3 becomes [?, 128, 5, 32] (the output’s height is odd). This doesn’t cause problem if you use this output to build the tensorRT model. But, when you use the output of last layer, it crashes.
When config[‘x_height’]=48, the output height of all layers is even. So, you can build TensorRT model without any drama!

I tested this with a few other types of architectures and got the same result…

AastaLLL · March 23, 2018, 7:22am

Hi,

We have fixed a softmax issue few weeks ago.

Could you test if this error still occurs on our latest TensorRT package? (should be v3.0.4)
https://developer.nvidia.com/nvidia-tensorrt-download

Thanks.

Nejla · March 25, 2018, 10:58pm

Hi AastaLLL,

The TensorRT that I’m using is v3.0.4.

AastaLLL · March 26, 2018, 6:00am

Hi,

We have checked this issue in detail today.

This is a known issue and is already fixed internally.
The fix will be included in our next TensorRT release.

Root cause is the incorrect handling of ‘Same’ padding option in conv2d op.
A simple reproduce source is:

inputs = tf.placeholder(dtype=tf.float32, shape=[1,225,300,3])
output = tf.layers.conv2d(inputs=inputs, filters=32, kernel_size=[3, 3], strides=[2, 2], padding='SAME')
output = tf.nn.relu(output,name='output')

TensorFlow output is (113, 150, 32)
TensorRT output is (112, 150, 32). ---------> incorrect

Checking your use case with following log, we confirm that you are facing the same issue as above:

output_name = 'Disc/out'

def D(x, is_training = False, reuse = False):
    with tf.variable_scope('Disc', reuse = reuse) as scope:
        ...
        lrelu4 = tf.maximum(0.2 * conv4, conv4, name='out')
        ...
        return lrelu4
...
inputs = tf.placeholder(dtype=tf.float32, shape=[1, num_channels, x_height, x_width])
output = D(inputs)
...
uff_model = uff.from_tensorflow(tf_model, [output_name])
...
engine = trt.utils.uff_to_trt_engine(G_LOGGER, uff_model, parser, MAX_BATCHSIZE, MAX_WORKSPACE)

print 'TensorFlow tensor dimension:'
print output
print 'TensorRT tensor dimension:'
print '(C,H,W)=(%d,%d,%d)' % (engine.get_binding_dimensions(1).to_DimsCHW().C(), engine.get_binding_dimensions(1).to_DimsCHW().H(), engine.get_binding_dimensions(1).to_DimsCHW().W())

TensorFlow tensor dimension:
Tensor(“Disc/out:0”, shape=(1, 256, 3, 16), dtype=float32)

TensorRT tensor dimension:
(C,H,W)=(256,2,16) ---------------------------------------------> incorrect

Please wait for our next release for the fix of this issue.
Bye the way, we also use GAN with RNN model for chat-bot use case here:
https://github.com/NVIDIA-Jetson/JEP_ChatBot

Thanks.

Nejla · March 26, 2018, 6:46am

Hi AstaLLL,

Thanks for your detailed feedback. Glad that it’s a known bug and that next release won’t have this issue!
May I ask when is your next release (roughly)?! Is it going to be during the ICML conference?!

Also, thanks for the GAN link. I’ll have a look at it… :)

AastaLLL · March 29, 2018, 6:17am

Hi,

Sorry that we cannot disclosure our schedule and plan here.
Please pay attention to our announcement and update for the latest information.

Thanks.

Nejla · March 31, 2018, 12:10pm

Hi AastaLLL,

Thanks. I heard about the announcement at GTC18. I’ll give it a go next week!
In general, would be good if you could provide some timeline. I think us and many others are at the evaluation phase at this stage. Knowing when (roughly) could help us with our planning/production phase as well… :)

AastaLLL · April 2, 2018, 7:50am

Hi,

Thanks for sharing your situation with us.
We have passed your request to our internal team to check what we can share.

Thanks.

Nejla · April 3, 2018, 1:57am

Thanks AastaLL, appreciate. :-)

Topic		Replies	Views
I don't get similar results with TensorRT and the trained tensorflow model! Jetson TX2	20	4482	October 18, 2021
the output shape of Tensorrt is wrong Jetson TX2	12	3722	October 18, 2021
TensorRT concatenate layer doesn't work TensorRT	12	2503	April 13, 2020
TensorRT run ONNX model with Int8 issue TensorRT	9	4212	October 12, 2021
TensorRT6 Dynamic Input Size does not support int8 with calibrator. TensorRT	13	3377	July 23, 2021
TensorRT doesn't perform properly the Tensorflow concat and\or reshape commands TensorRT	8	3586	December 27, 2018
TensorRT: Cannot set bindings for dynamic shapes TensorRT	4	5572	October 12, 2021
Build engine error when use pointnet-like structure and TensorRT 8.0.1.6 TensorRT tensorrt	13	1669	January 14, 2022
Crash during TensorRT conversion of ONNX model with Conv3D layer TensorRT	4	1064	March 11, 2022
TensorRT added layer before output TensorRT	3	780	February 13, 2020

TensorRT model cannot be built when at some layer's of the model, the output of Conv2D's height and/or width is odd

Related topics