TensorFlow and TensorRT (DriveWorks) results do not match

We have been testing tensor flow and tensorRT on a DrivePX2 and we are unable to get similar results. We have compile Tensor Flow 1.8 on DPX2 (aarch64) and all our tests are on the device. We have a network trained to produce 20 points (predicted trajectories). Our  Tensor Flow workflow  is  as follows.
  1. 1920x1208 sekonix ar231 H264 images are captured and stored and converted to PNG using ffmpeg.
  2. for training each image is first resized to 480x302 and then cropped to 336x140.
  3. The network is trained with dimensions 336x140 (resized and cropped image) and format of HxWxC (336x140x3) (optimized for tensor flow).

For tensorRT on DPX2 workflow:

  1. The above model is converted to UFF and CxHxW (3x140x336 needed for tensorRT) and converted to bin format using tensorRT optimizer tool on DPX2 with NO optimization options turned on.
  2. A 1920x1208 ar231 H264 Image is acquired at run time or loaded form a serialized file.
  3. Using our own Cuda Kernel we resize the image to 480x302.
  4. using our own Cuda Kerenel we crop the image to 336x148.
    5 We initialize tendorRT data conditioner (dwDataConditioner_initialize) with network dimensions 336x148 and the option ignoreAspectRatio to true. We have tried to turn this option off and although we get different results , still the results do with not match with TensorFlow.
  5. We run a single inference using dwDataConditioner_prepareData and the ROI of x=0, y-0, width-network width(336) , height=networkheight(140) and our resized/cropped image as input to the network. Our image is single plane interleaved RGBA, the documentation states that in the case of an interleaved image the A channel is dropped,

Given this work flow we are getting very different results on DPX2 with tensorRT than tensor Flow.

NOTE:In our latest test, using a new Cuda Kernel, we converted the final image from interleaved single plane RGBA to 3 plane RGB image and results (the 20 output points) do not match. Any idea what we may be doing wrong?


Not sure this unexpected result is caused by different platform or different framework.

Is it possible to run your model with TensorRT on an x86-based system first?
If yes, could you help to check it?


Thank you so much for the reply. I have tried this on x86 based system and the problem persists. I am using the latest PDK 5050bL_SDK. One thing that I forgot to mention is that on Tensor Flow, the image is normalized to 0-1 (each channel divided by 255) and I do the same thing for TensoRT with a Cuda Kernel. Is there an “easy” way to display or dump out the image that is generated by the data conditioner on disk (a float pointer is returned, is this a YUV image?) so we can see exactly what it is that is being passed to the network? It would be a very useful debugging tool for us if we could just dump out the content that is generated from the input image by the data conditioner.

Best Regards


Some image encoder is required to display your data in an image format.
(It’s not recommended. It may increase the complexity if some conversion is applied.)

A simplest debugging method is to compare the value directly.
Could you use the same image and output the value of normalized(TensorFlow) and cuda kernel(TensorRT).