How to get model configuration from HTTP API without first loading the model in EXPLICIT mode?

scotrraaj.gopal · April 29, 2025, 3:38pm

Description

(Disclaimer: this is duplicate of issue in TIS Github #8153 as there was no response since it was posted 2 weeks ago)

From my experience with interacting with Triton Server using Python client, I haven’t been able to find a way to get a model’s configuration without loading it first, when the server is running in EXPLICIT mode. The reason for that is I have a logic to load a specific version of a model, that updates the runtime model config upon loading. I had to use a context that loads and unloads a model to perform this operation.

with self._model_config_context(model_name) as model_cfg:
    version_policy = (
        {"latest": {"num_versions": 1}}
        if model_version == "latest"
        else {"specific": {"versions": [model_version]}}
    )
    
    model_cfg.update({"version_policy": version_policy})
    if init_params:
        model_cfg["parameters"].update(
            {k: {"string_value": str(v)} for k, v in init_params.items()}
        )
    self.logger.debug(f"Model config: {model_cfg}")
    new_model_cfg = model_cfg.copy()
    
    self.client.load_model(model_name, config=json.dumps(new_model_cfg))

While loading and unloading an ONNX model can be quite fast, it causes a lot of latency to do the same for Python backend models. If there was an API/method to simply read the model configuration for EXPLICIT mode that I didn’t know about, I’ll be happy to hear details about them. Any other opinions/sharing are welcome too.

Thank you in advance.
Scot

AakankshaS · April 30, 2025, 5:12pm

Hi @scotrraaj.gopal ,
Thank you for highlighting this.
I will escalate thi sto resp team to get proper attention.

Topic		Replies	Views
Triton client model config Triton Inference Server (archived)	0	1166	June 16, 2021
Support for PyTorch Triton Inference Server (archived)	1	549	April 15, 2021
TensorRT Inference Server - AWS S3 Model repository Triton Inference Server (archived)	0	615	May 23, 2019
Generation of Triton Inference Server configuration for TensorRT exported model of TAO classification (resnet) TAO Toolkit tensorrt , inference-server-triton , tao	7	2769	June 23, 2022
List of available models in Model control mode Triton Inference Server (archived)	0	471	March 6, 2020
Triton did not update the model after users added a new model into model_repository \| There is nothing on localhost:8000/api/status Triton Inference Server (archived)	3	2594	October 12, 2021
Triton server : dynamic config.pbtxt config file generation for an model DeepStream SDK	3	1986	May 10, 2022
How to encrypt model in triton server General Topics and Other SDKs inference-server-triton	0	843	April 23, 2022
How to load specific version of a model using EXPLICIT mode? AI & Data Science inference-server-triton	0	45	April 29, 2025
Identifying the Best AI Model Serving Configurations at Scale with NVIDIA Triton Model Analyzer Technical Blog	0	427	May 23, 2022

How to get model configuration from HTTP API without first loading the model in EXPLICIT mode?

Description

Related topics