Please provide complete information as applicable to your setup.
• Hardware Platform (Jetson / GPU) T4, RTX4000 • DeepStream Version 6.1.1 • JetPack Version (valid for Jetson only) • TensorRT Version 8.1.2 • NVIDIA GPU Driver Version (valid for GPU only) 515.65.01 • Issue Type( questions, new requirements, bugs) questions • How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing) • Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)
Hello,
I am using triton-inference server on single GPU server and I cannot fully utilise my GPU due to one of model is slow.
is there a way to scale individual model in triton-server?
It is python pre-process model. I want to run multiple instance of same model. If supported, does triton-server manage number of request to any instance?
There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks
Pytorch bbackend supports dynamic batch, so you can build the model with max batch, and send request to Triton server/pytorch backend with any batch less than the max batch