repeat post, please ignore this

yulifu_123 · July 12, 2019, 6:49am

Hi,

I am using tf-trt to accelerate my inception_resnet_v2（.pb） model inference, when i run the session.run() sentence with the same test image，the result are as follows,

TF model： 24ms;
TF-TRT + FP32: 13ms;
TF-TRT + FP16: 7ms;

it seems that the result is normal, yes, but when i wrap a flask http service, and use the wrk tool(GitHub - wg/wrk: Modern HTTP benchmarking tool) to test the server qps，the session run time are as follows,

TF model： 28ms;
TF-TRT + FP32: 28ms;

I have checked my code seriously, so the question is why ?

the environment are as follows,

docker image from NGC: nvidia-tensorflow-19.06-py3
hardware: NVIDIA T4

I am looking forward to receiving your reply, thank you !