Inference using FP16 and FP32 precision giving no performance gain on Jetson Nano

I am able to run deeplabv3+ model on jetson nano. My model is trained on tensorflow and exported into tensorflow frozen graph. Later on I convert TensorRT optimized graph with precision FP16 as shown below:

trt_graph_def = trt.create_inference_graph(
        input_graph_def=frozen_graph_def,
        outputs=['SemanticPredictions'],
        max_batch_size=1,
        max_workspace_size_bytes=1 << 26,
        precision_mode='FP16',
        is_dynamic_op =True,
        maximum_cached_engines=1,
        minimum_segment_size=10
    )

I get around 6 FPS for an image size of 320x240x3.
The same procedure is applied for creating a TensorRT optimized graph with precision mode FP32.
However I do not see any performance difference. In case of FP32 also I get 6 FPS.
What might be the reason behind this behaviour?

References:
If I see the benchmarks for precision modes then, I see only Turing and Volta architectures being mentioned in https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#performance
If I see benchmark for Jetson Nano then I can see only FP16 results for different models but not the comparison between FP32 and FP16. https://developer.nvidia.com/embedded/jetson-nano-dl-inference-benchmarks

Hi,

It’s recommended to check how much layers inside your model inference with TensorRT.

In TF-TRT, it will automatically fallback the TRT non-supported layer back to TensorFlow.
For those layers inference with TensorFlow implementation, low-precision benefit is quite limited.

In general, if you model is fully supported by our TensorRT library.
It’s recommended to use pure TensorRT directly for a better performance.

Thanks.