I’m interesting to know what are the differences between these two TensorRT versions from the following point of views:
- IEEE-754 standard usage
- FMA usage
- Fast mode usage
- FP Optimizations
I’m asking that because when I’m running the same CNN model that was developed with Tensorflow on both platforms:
- amd64 with Ubuntu 16.04, TensorRT 4.0.1, CuDNN 7.1.4, CUDA 9.0, GeForce 1080, Display driver 410.72
- TX2, Jetpack 3.2.1 with tensorRT 4.0.2, CuDNN 7.1.5, CUDA 9.0
Despite the fact that both of them are work properly and the final detection’s parameters are equal,
When I compare the model binaries outputs (32 bits FP) I see that there is a significant accuracy gap between them.
I have the ability to run the model without TensorRT but directly via Tensorflow C++ and CuDNN.
When I compare the Tensorflow binaries outputs that were generated in both platforms I cannot see this accuracy gap at all.