I have a TensorRT Inference Server which hosts my model and I want to create a Python “client” application which sends requests to the Server, however I want to do the pre/post-processing of the data on the Server itself, not in the “client” application.
Having seen in these slides on Page 35/48 (the NLP TensorFlow example):
I’m wondering is the best approach to create a Flask application in the Server which does this pre/post-processing and returns the response to my “client” application which will then only need to send and receive requests, or is there another way?
However, just to give my “2 cents”, I believe your setup would work fine if your goal is a simple “client”, though you’re really just creating a middleman which might incur too much communication overhead depending on your goals.