We found out that there are first 2 inference req taking more time with inception v3 model while after that it is taking very less time. How can we configure that with model warm up option? What are the options we have to set or is there any configuration to do that?
The 1.9.0 version of the inference server (container version 19.12) has a warmup option. See https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_configuration.html#model-warmup
We are using 19.11 TRT IS version, will this work with that? Secondly can you share example like configuration file which will help us, I am not finding enough documentation on this?
Warmup was added in 19.12 so you will have to update to use it.
Note that this same issue has been discussed on the GitHub issues:https://github.com/NVIDIA/tensorrt-inference-server/issues/708
There is an L0_warmup CI test that uses the warmup parameters. They are patched into existing config.pbtxt files as here: https://github.com/NVIDIA/tensorrt-inference-server/blob/master/qa/L0_warmup/test.sh#L66.