After optimization a graph using this function
graphdef_trt = tensorflow.contrib.tensorrt.create_inference_graph( input_graph_def=graphdef_frozen, outputs=[OUTPUT_NODE1, OUTPUT_NODE2], max_batch_size=1, max_workspace_size_bytes=1 << 32, precision_mode='FP32')
prediction of the model were slightly changed.
Is it normal behavior?
If yes, could you please provide a link where I can find why this happens?
Used nvcr.io/nvidia/tensorflow:18.10-py3 docker image
tried ‘FP16’ predictions the same as for ‘FP32’ but different for original graph
tried ‘INT8’ predictions the same as for original graph but infer time increased from 6ms to 320ms (‘FP16’ and ‘FP32’ infer time 4ms)
Something strange is happening