TensorRT 4.0.1 for amd64 Vs. TensorRT 4.0.2 for TX2

I’m interesting to know what are the differences between these two TensorRT versions from the following point of views:

  1. IEEE-754 standard usage
  2. FMA usage
  3. Fast mode usage
  4. FP Optimizations

I’m asking that because when I’m running the same CNN model that was developed with Tensorflow on both platforms:

  1. amd64 with Ubuntu 16.04, TensorRT 4.0.1, CuDNN 7.1.4, CUDA 9.0, GeForce 1080, Display driver 410.72
  2. TX2, Jetpack 3.2.1 with tensorRT 4.0.2, CuDNN 7.1.5, CUDA 9.0

Despite the fact that both of them are work properly and the final detection’s parameters are equal,
When I compare the model binaries outputs (32 bits FP) I see that there is a significant accuracy gap between them.

I have the ability to run the model without TensorRT but directly via Tensorflow C++ and CuDNN.
When I compare the Tensorflow binaries outputs that were generated in both platforms I cannot see this accuracy gap at all.

Please advise.


This is not expected. To help us debug, can you please share a small repro containing how you converted the TF CNN model to TRT, inference code, CNN model, and interference test cases that demonstrate the accuracy gaps you are seeing?

NVIDIA Enterprise Support.