repeat post, please ignore this


I am using tf-trt to accelerate my inception_resnet_v2(.pb) model inference, when i run the sentence with the same test image,the result are as follows,

TF model: 24ms;
TF-TRT + FP32: 13ms;
TF-TRT + FP16: 7ms;

it seems that the result is normal, yes, but when i wrap a flask http service, and use the wrk tool( to test the server qps,the session run time are as follows,

TF model: 28ms;
TF-TRT + FP32: 28ms;

I have checked my code seriously, so the question is why ?

the environment are as follows,

docker image from NGC: nvidia-tensorflow-19.06-py3
hardware: NVIDIA T4

I am looking forward to receiving your reply, thank you !