I am using tf-trt to accelerate my inception_resnet_v2（.pb） model inference, when i run the session.run() sentence with the same test image，the result are as follows,
TF model： 24ms;
TF-TRT + FP32: 13ms;
TF-TRT + FP16: 7ms;
it seems that the result is normal, yes, but when i wrap a flask http service, and use the wrk tool(https://github.com/wg/wrk) to test the server qps，the session run time are as follows,
TF model： 28ms;
TF-TRT + FP32: 28ms;
I have checked my code seriously, so the question is why ?
the environment are as follows,
docker image from NGC: nvidia-tensorflow-19.06-py3
hardware: NVIDIA T4
I am looking forward to receiving your reply, thank you !