Inference using FP16 and FP32 precision giving no performance gain on Jetson Nano

I am able to run deeplabv3+ model on jetson nano. My model is trained on tensorflow and exported into tensorflow frozen graph. Later on I convert TensorRT optimized graph with precision FP16 as shown below:

trt_graph_def = trt.create_inference_graph(
        max_workspace_size_bytes=1 << 26,
        is_dynamic_op =True,

I get around 6 FPS for an image size of 320x240x3.
The same procedure is applied for creating a TensorRT optimized graph with precision mode FP32.
However I do not see any performance difference. In case of FP32 also I get 6 FPS.
What might be the reason behind this behaviour?

If I see the benchmarks for precision modes then, I see only Turing and Volta architectures being mentioned in
If I see benchmark for Jetson Nano then I can see only FP16 results for different models but not the comparison between FP32 and FP16.


It’s recommended to check how much layers inside your model inference with TensorRT.

In TF-TRT, it will automatically fallback the TRT non-supported layer back to TensorFlow.
For those layers inference with TensorFlow implementation, low-precision benefit is quite limited.

In general, if you model is fully supported by our TensorRT library.
It’s recommended to use pure TensorRT directly for a better performance.