Hi i want to know if i am doing this correctly…
I am using Inference server container 19.05. I want to test inference performance for resnet50 int8.
- I have saved a new resnet50 int8 model.plan using the trtexec resnet50.caffemodel with --int8 flag.
- git cloned the infernece_server repo
- I have placed the resnet50 int8 model.plan in the inferenceserver/docs/examples/modelrepository/resnet50_netdef/1/
- edited the config file in input and output as TYPE_INT8.
- When i create the server - I am getting an error that the the input and the output expect fp32 and its set to int8.
How to perform int8 inference in the inference server??
Thanks.
When you create an int8 model it doesn’t necessarily change the datatype of the input/output tensors. Assuming the name of your model is resnet50 try placing it in a directory like this (note there is no config.pbtxt):
models/
resnet50/
1/
model.plan
Then run trtserver with the --strict-model-config=false flag. This will cause the server to generate a config.pbtxt for you. You should then be able to use the model. See https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-master-branch-guide/docs/model_configuration.html#generated-model-configuration
Note that to get the best performance you will like need to manually create a config.pbtxt that specified multiple instance groups. For resnet50 you typically want something like:
instance_group [
{
count: 4
kind: KIND_GPU
}
]