I am able to run deeplabv3+ model on jetson nano. My model is trained on tensorflow and exported into tensorflow frozen graph. Later on I convert TensorRT optimized graph with precision FP16 as shown below:
trt_graph_def = trt.create_inference_graph(
input_graph_def=frozen_graph_def,
outputs=['SemanticPredictions'],
max_batch_size=1,
max_workspace_size_bytes=1 << 26,
precision_mode='FP16',
is_dynamic_op =True,
maximum_cached_engines=1,
minimum_segment_size=10
)
I get around 6 FPS for an image size of 320x240x3.
The same procedure is applied for creating a TensorRT optimized graph with precision mode FP32.
However I do not see any performance difference. In case of FP32 also I get 6 FPS.
What might be the reason behind this behaviour?
References:
If I see the benchmarks for precision modes then, I see only Turing and Volta architectures being mentioned in https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#performance
If I see benchmark for Jetson Nano then I can see only FP16 results for different models but not the comparison between FP32 and FP16. https://developer.nvidia.com/embedded/jetson-nano-dl-inference-benchmarks