Hello,
First, I tried to find if my issue was already discussed but I couldn’t find any specific topic that answered it.
If there is a topic like this please refer me.
My platform:
The Tensorflow pb (which was converted to the uff file) was generated under the following platform:
PC#1:
Linux distro and version - Linux-x86_64, Ubuntu, 16.04
GPU type – GTX-1070TI
nvidia driver version – 384.130
CUDA version – 8.0.44
CUDNN version – 6.0.21
Python version – 3.5.2
Tensorflow version – 1.4.1
TensorRT version – Not used
The TensorRT uff was generated under the following platform:
PC#2:
Linux distro and version - Linux-x86_64, Ubuntu, 16.04
GPU type - GeForce GTX 1080
nvidia driver version - 396.26
CUDA version - Release 9.0, V9.0.252
CUDNN version - 7.1.4
Python version – 3.5.2
Tensorflow version – 1.8
TensorRT version – 4.0.1.6
The TensorRT CUDA engine was executed under the following platform:
Jetson TX2 developer kit board:
Linux distro and version –
Ubuntu 16.04.5 LTS (Xenial Xersus)
L4T - #R28 (release), REVISION 2.1, GCID: 11272647, BOARD: t186ref, EABI: aarch64, DATE: Thu May 17 07:29:06 UTC 2018
GPU type - As part of the Jetson TX2 developer kit board
JetPack – 3.2.1 (But TensorRT and CUDNN were updated according to JetPack 3.3 versions)
nvidia driver version - As part of the JetPack
CUDA version - Release 9.0, V9.0.252
CUDNN version - 7.1.5
Python version – Not used
Tensorflow version – Not used
TensorRT version – 4.0.1.6
Problem description:
I’m using the TensorRT C++ APIs in order to inference a CNN model (Yolo3) that was developed and trained using Tensorflow and TensorRT Python APIs.
The model has three outputs based on three different sizes tiles devision of the inputs images.
The model was developed twice, first with NHWC format and second with NCHW format.
All model filters (layers) and tensors were updated according to the required format.
The used dataset is COCO.
For both formats, when the inference process is activated using only Tensorflow C++ (on Windows, TF was built by me from sources), APIs it works OK and all required detections are detected.
When NCHW format is used in order to be able to use the TensorRT on my Jetson only the first model output is OK and the two others outputs content are wrong.
The only difference between the first model output and the two others is a upsample command that was added in order to support the required tile size.
This is the upsample implementation:
class upsample(BaseOp):
def upsample(self, factor):
# upsampling using concat and reshape
# TODO: enable custom factor
with tf.name_scope('upsample'):
x = self.inp.out
if self.channelOrder == 'NHWC':
x = tf.transpose(x, perm=[0, 3, 1, 2]) # BHWC -->BCHW
size = x.get_shape().as_list()
b = size[0]
c = size[1]
h = size[2]
w = size[3]
x = tf.reshape(x, [-1, c, h, w, 1])
x = tf.concat([x, x], axis=3)
x = tf.concat([x, x], axis=4)
x = tf.reshape(x, (-1, c, h * factor, w * factor))
if self.channelOrder == 'NHWC':
x = tf.transpose(x, perm=[0, 2, 3, 1])
return x
The suspicious commands are the reshape and\or concat.
Please advise.
ForNvidia.zip (9.93 KB)