Wrong model versioning when using gRPC

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) GPU
• DeepStream Version 6.1
• JetPack Version (valid for Jetson only)
• TensorRT Version 8.2.5
• NVIDIA GPU Driver Version (valid for GPU only) 510.73
• Issue Type( questions, new requirements, bugs) bug
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
I was testing the python example deepstream-ssd-parser inside docker nvcr.io/nvidia/deepstream:6.1-triton with my own model (which shouldn’t change anything at the following behavior).

As I am switching to an external triton server nvcr.io/nvidia/tritonserver:22.06-py3 using gRPC, the model versioning does not work properly anymore: Originally the models are saved in the following folder structure

models
|- ssd
|  |- 1
|  |  |- model.pt
|  |- config.pbtxt

where the 1-folder corresponds to the first model version. I’ve now added other model versions resulting in a folder structure like this

models
|- ssd
|  |- 1
|  |  |- model.pt
|  |- 2
|  |  |- model.pt
|  |- config.pbtxt

When restarting the external triton server, it correctly identifies the new version and loads only the current version 2. However, if I try to run inference on the external server through the pipeline, I get the error

ERROR: infer_grpc_client.cpp:342 inference failed with error: Request for unknown model: 'ssd' version 1 is not found

Changing the backend.triton.version in the config file of the pgie pipeline element to either -1 or 2 does not change anything and the error still persists. The config part under infer_config then looks like

backend {
    triton {
      model_name: "ssd"
      version: -1
      grpc {
        url: "0.0.0.0:8001"
      }
    }
  }

How can I ensure that the pipeline requests the correct model version (i.e. the one from the version parameter in the infer_config) during inference from the external triton server?

I believe you need to change the version policy in config.pbtxt to all or specific in order to have both model versions available, by default only the latest model is available for inferencing. https://github.com/triton-inference-server/server/blob/main/docs/model_configuration.md#version-policy

Thanks for the tip. However, I don’t want to access all model versions, I just want to access the latest model version.
This works totally fine for the external triton inference server. The issue is that - no matter how I change the version in the inference config for the pgie pipeline step - the pipeline always requests model 1 from the external server for inference.
So my goal is not to make this request of model 1 work, but to make sure that the pipeline pgie-step only requests the latest model version (or the version I want) from the inference server, which would be 2 in that case and not always version 1.

  • Latest: Only the latest ‘n’ versions of the model in the repository are available for inferencing. The latest versions of the model are the numerically greatest version numbers. version_policy: { latest: { num_versions: 2}}

Please try this.

I have tried this and as I have already explained it in my comment above: The code is running but it doesn’t solve my problem! It solves a completely different problem. To make it more clear, I try to explain the difference between my problem and the problem you are providing a solution for:

Generally there exists two places where the model versions are relevant:

  1. The model versions the external triton server is loading and can run inference on. That can be controlled by config.pbtxt
  2. The model version requested by the pipeline (specifically by the pgie-nvinferserver Element: pgie = Gst.Elementfactory.make("nvinferserver", "pgie") ) which should be controlled by the config file config_infer.txt which is loaded into the pipeline with pgie.set_property("config-file-path", "config_infer.txt").
    This config_infer.txt file in my application is equivalent to the config_triton_grpc_infer_primary_peoplenet.txt from example deepstream-test3

The problem to your solution:
I have multiple models and I want to switch between them in the pipeline but I can only access one from the triton server. Then loading the latest n (in our example n=2) models into the triton server would help. But this is NOT my described problem!

My problem:
I have multiple models but I only want to load the latest one. Loading the latest one into the external triton server (first point of enumeration above) works totally fine. It doesn’t matter what the value of <nv> in the dict version_policy: { latest: { num_versions: <nv> }} is, the latest model will always be loaded!
The second enumeration point is the issue: No matter how I change the value of <version> in the dict

infer_config {
  backend {
    triton {
      version: <version>
    }
  }
}

the nvinferserver element always requests version 1 from the external triton inference server! Why does the change of the value <version> does not change the requested model version and how can this be fixed? Neither <value>=2 nor <value>=-1 changed the behavior.

Thanks for pointing it out, we will fix it in our upcoming release.

Thank you @Amycao for the update and support. Do you have a rough time frame for the next release?

Sorry, i do not have.

Hi @fabian.vogel
should be in this or next month