@NVES_R thanks for your reply. Sorry bother you again. Now I have another two questions:
I use the saved_model from here https://storage.googleapis.com/download.tensorflow.org/models/official/20181001_resnet/savedmodels/resnet_v2_fp32_savedmodel_NHWC_jpg.tar.gz, and use tensorrt to optimize this saved_model following the steps https://medium.com/tensorflow/optimizing-tensorflow-serving-performance-with-nvidia-tensorrt-6d8a2347869a. Then I modify the resnet_client_grpc.py https://github.com/tensorflow/serving/blob/master/tensorflow_serving/example/resnet_client_grpc.py to test the inference time:
request.model_spec.name = 'resnet'
request.model_spec.signature_name = 'predict'
for count in range(1000):
start = time.time()
result = stub.Predict(request, 30.0) # 10 secs timeout
end = time.time()
avg_time = np.mean(times[100:]) # don't include first 100 run
print("predict image cat.jpg avg_time:%f\n" %avg_time)
My test results for avg_time is below:
original tensorflow saved_model: 10.38/10.49/10.39(ms)
tf saved_model optimized tensorrt FP32: 7.72/7.76/7.79(ms)
tf saved_model optimized tensorrt FP16: 7.19/7.16/7.21(ms)
It seems that the inference speed optimized by tensorrt FP32/FP16 is much faster than the original tensorflow, about 30%.
Next I test my own saved_model based resnet-50 which has some up-sampling ops, And you can get my saved_model from here https://github.com/IvyGongoogle/trt_of_tf. Using tensorrt FP32/16 to optimize it and serving it with tensorflow serving, then I use the above same modified resnet_client_grpc.py to test the inference time, the test results for avg_time is below:
original tensorflow saved_model: 48.65/48.74/48.96(ms)
tf saved_model optimized tensorrt FP32: 47.46/47.33/47.19(ms)
tf saved_model optimized tensorrt FP16: 43.85/43.77/43.84(ms)
First question: For my this model, it is sad that the inference speed optimized tensorrt FP32/FP16 is nearly same with the original tensorflow. What causes this? There is a little ops can be optimized tensorrt for my model which yet is based resnet-50( for resnet-50 we have good effect proved above)?
Second question: For original resnet-50 and my own model, it seems that the inference speed optimized by tensorrt FP16 is only slightly faster than tensorrt FP32, but theoretically this ratio is almost two times?
Can you give some advises? Looking forward your reply