I would like to use different versions of the GPT model for different requests. Is it possible to do this on a single VSS instance, or do I need to set up my own VSS instance for each version? Specifically, I would like to use different models for processing video streams and files.
Thanks!
Try setting the VIA_VLM_OPENAI_MODEL_DEPLOYMENT_NAME environment variable before calling the /summarize api, then specify the api_type in the parameters.
This usually works, if not, try starting multiple instances
Refer to these links.
There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks.