model accuracy penalty with tensorRT on jetson TX2


Im using TensorRT for inference of a keras model on tensorRT on a Jetson TX2 device.
The model outputs are different when compared to the outputs of the keras model, leading to lower accuracy.

cuda version 9.0.252
tensorrt version 4.1.3

(on host) model is taken from:
(on host) keras to tensorflow pb translation is done with:
pb to uff is done with: (adapted from an nvidia samples)
engine creation and inference code: (adapted from nvidia UFF MNIST sample)

I suspected that the data ordering was the source for the issue (CHW vs HWC), but since im using only grayscale images it would not make that much difference, so in the end im not sure what is the issue.
perhaps TRT is doing some optimizations that impact the accuracy?