Best way to do pre/post-processing for TensorRT Inference Server

I have a TensorRT Inference Server which hosts my model and I want to create a Python “client” application which sends requests to the Server, however I want to do the pre/post-processing of the data on the Server itself, not in the “client” application.

Having seen in these slides on Page 35/48 (the NLP TensorFlow example):

I’m wondering is the best approach to create a Flask application in the Server which does this pre/post-processing and returns the response to my “client” application which will then only need to send and receive requests, or is there another way?

Hi,

This isn’t really a TensorRT-specific question, and would probably be better suited for https://github.com/tensorrt-inference-server/issues.

However, just to give my “2 cents”, I believe your setup would work fine if your goal is a simple “client”, though you’re really just creating a middleman which might incur too much communication overhead depending on your goals.

I think the more elegant solution with TRTIS might be to use TRTIS’s “Ensemble” model feature, which is more like a pipeline, where you could probably have 3 “models” served by TRTIS to do, preprocess → infer → postprocess: https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/models_and_schedulers.html#ensemble-models

Also FYI, seems like there was a directory change in the TRTIS repo recently, so some of the “src/clients/python/*” links on that page I linked may be temprorarily broken, I pointed it out to the team. You can find an example python ensemble image client here: https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/clients/python/examples/ensemble_image_client.py