I am able to run deeplabv3+ model on jetson nano. My model is trained on tensorflow and exported into tensorflow frozen graph. Later on I convert TensorRT optimized graph with precision FP16 as shown below:
trt_graph_def = trt.create_inference_graph( input_graph_def=frozen_graph_def, outputs=['SemanticPredictions'], max_batch_size=1, max_workspace_size_bytes=1 << 26, precision_mode='FP16', is_dynamic_op =True, maximum_cached_engines=1, minimum_segment_size=10 )
I get around 6 FPS for an image size of 320x240x3.
The same procedure is applied for creating a TensorRT optimized graph with precision mode FP32.
However I do not see any performance difference. In case of FP32 also I get 6 FPS.
What might be the reason behind this behaviour?
If I see the benchmarks for precision modes then, I see only Turing and Volta architectures being mentioned in https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#performance
If I see benchmark for Jetson Nano then I can see only FP16 results for different models but not the comparison between FP32 and FP16. https://developer.nvidia.com/embedded/jetson-nano-dl-inference-benchmarks