I have made a custom CNN+BiLSTM model with custom CTC loss. On GPU, inferencing was taking 50-60 ms, but after tensorRT conversion, it was taking 600 ms on Jetson Nano. I’m using Tensorflow Version 1.15. What can be the issue? What else I can do so that inference time can be decreased.
I am little confused, what’s the inference time on GPU after TRT conversion? What’s the inference time on Jetson Nano before TRT conversion?
Performance normally vary based on the device used.
We mostly consider the latency and throughput of the network inference, excluding the data pre and post-processing overhead.
Could you please check if inference time measured in this case is excluding the pre & post processing?
See, before TRT conversion inference time was 50-60 ms on GPU.
On jetson nano I didn’t checked inference time before TRT conversion.
After TRT conversion, on Jetson Nano, inferencing time was 600 ms.
is this info sufficient?
I will suggest you to check the inference time before/after on same platform/device to get correct improvement result.
Also, TRT plans are specific to the exact GPU model they were built on (in addition to the platforms and the TensorRT version) and must be re-targeted to the specific GPU in case you want to run them on a different GPU.