I am comparing the inference time of Keras to a TensorRT 5 optimized Keras model. The speedup of TensorRT is however only a factor 1.2…
I’m using the Python API of TensorRT 5 on AWS p3.2xlarge (Tesla V100 GPU) with the Ubuntu Deep Learning Base AMI. Moreover, my model is similar to DnCNN (https://github.com/husqin/DnCNN-keras), but without the residual part and in NCHW data format.
Using 420 images of 512 by 512 pixels and a batch size of 16, I get the following results:
Keras: 14.97 seconds of total inference time
TRT FP32: 12.3 seconds of total inference time
Is this an expected speed-up or is something wrong?