TensorRT inference server - where preprocessing supposed to fit?

In our model, clients sent more or less “raw” data - which have to be preprocessed prior to being fed into a TRT model. Where this “preprocessing” code is supposed to be hooked in a TRT Inference server model?


Once the server is running you can use a “client” application to send inference requests to the server, where the client application can preprocess the raw input. Depending on if the frequency, you can either batch process the inputs or specify a single input.

This is not TRTIS specific but a data pipeline question.

Thanks for the reply.

The model you proposed is understood. Unfortunately, it means that to work with a “thin” clients we need to build and maintain a separate “frontend” server which will have to serialise a data once again to push it to a “backend” TRTIS.
At the same time, exposing some (say, Gstreamer-like) interface for a custom “filters” in TRTIS will let us concentrate on a flow-specific logic, reduce latency and benefit from TRTIS network infrastructure and resource management.