Failed to load 'sid-minibert-onnx', no version is available

I was running into a issue while following this quick-start guide for Morpheus
https://docs.nvidia.com/morpheus/morpheus_quickstart_guide.html

When doing the following step

(mlflow) root@mlflow-6d98:/mlflow# mlflow deployments create -t triton \
      --flavor triton \
      --name sid-minibert-onnx \
      -m models:/sid-minibert-onnx/1 \
      -C "version=1"

The error I would get is


Copied /mlflow/artifacts/0/559ba3818c4a418dacce5c425fb74f25/artifacts/triton/sid-minibert-onnx to /common/triton-model-repo/sid-minibert-onnx
Saved mlflow-meta.json to /common/triton-model-repo/sid-minibert-onnx
Traceback (most recent call last):
  File "/opt/conda/envs/mlflow/lib/python3.8/site-packages/mlflow_triton/deployments.py", line 109, in create_deployment
    self.triton_client.load_model(name)
  File "/opt/conda/envs/mlflow/lib/python3.8/site-packages/tritonclient/http/__init__.py", line 622, in load_model
    _raise_if_error(response)
  File "/opt/conda/envs/mlflow/lib/python3.8/site-packages/tritonclient/http/__init__.py", line 64, in _raise_if_error
    raise error
tritonclient.utils.InferenceServerException: failed to load 'sid-minibert-onnx', no version is available

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/envs/mlflow/bin/mlflow", line 8, in <module>
    sys.exit(cli())
  File "/opt/conda/envs/mlflow/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/envs/mlflow/lib/python3.8/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/opt/conda/envs/mlflow/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/envs/mlflow/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/envs/mlflow/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/envs/mlflow/lib/python3.8/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/envs/mlflow/lib/python3.8/site-packages/mlflow/deployments/cli.py", line 133, in create_deployment
    deployment = client.create_deployment(name, model_uri, flavor, config=config_dict)
  File "/opt/conda/envs/mlflow/lib/python3.8/site-packages/mlflow_triton/deployments.py", line 111, in create_deployment
    raise MlflowException(str(ex))
mlflow.exceptions.MlflowException: failed to load 'sid-minibert-onnx', no version is available

The solution to this was to helm uninstall all the deployments related to Morpheus and re-install them again and the deployment was successful.

To uninstall 
root@k8master01:~# helm list  --namespace $NAMESPACE
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /root/.kube/config
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /root/.kube/config
NAME            NAMESPACE               REVISION        UPDATED                                 STATUS          CHART                           APP VERSION
helper          davindermorpheus        1               2022-09-18 22:19:50.155588261 +0000 UTC deployed        morpheus-sdk-client-22.06       22.06
morpheus-mlflow davindermorpheus        1               2022-09-18 04:32:29.413680882 +0000 UTC deployed        morpheus-mlflow-22.06           22.06
morpheus1       davindermorpheus        1               2022-09-17 04:59:47.852987934 +0000 UTC deployed        morpheus-ai-engine-22.06        22.06
helm uninstall morpheus1  --namespace $NAMESPACE
helm uninstall morpheus-mlflow  --namespace $NAMESPACE
helm uninstall helper  --namespace $NAMESPACE





mlflow) root@mlflow-7889bfd95f-jw4s9:/mlflow#  mlflow deployments create -t triton \
>       --flavor triton \
>       --name sid-minibert-onnx \
>       -m models:/sid-minibert-onnx/1 \
>       -C "version=1"
Copied /mlflow/artifacts/0/254ae8198afd4452b9e366a116c1c694/artifacts/triton/sid-minibert-onnx to /common/triton-model-repo/sid-minibert-onnx
Saved mlflow-meta.json to /common/triton-model-repo/sid-minibert-onnx

triton deployment sid-minibert-onnx is created
(mlflow) root@mlflow-7889bfd95f-jw4s9:/mlflow#

Other note: when I had this issue in the kubectl -n $NAMESPACE logs deploy/ai-engine logs I wouldn’t see the TRITONBACKEND related logs.

Example:

I0918 23:23:05.136036 1 grpc_server.cc:4587] Started GRPCInferenceService at 0.0.0.0:8001
I0918 23:23:05.136197 1 http_server.cc:3303] Started HTTPService at 0.0.0.0:8000
I0918 23:23:05.177142 1 http_server.cc:178] Started Metrics Service at 0.0.0.0:8002
I0918 23:25:48.075752 1 model_repository_manager.cc:1191] loading: sid-minibert-onnx:1
I0918 23:25:48.180726 1 onnxruntime.cc:2466] TRITONBACKEND_Initialize: onnxruntime
I0918 23:25:48.180744 1 onnxruntime.cc:2476] Triton TRITONBACKEND API version: 1.10
I0918 23:25:48.180814 1 onnxruntime.cc:2482] 'onnxruntime' TRITONBACKEND API version: 1.10
I0918 23:25:48.180818 1 onnxruntime.cc:2512] backend configuration:
{"cmdline":{"auto-complete-config":"false","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}}
I0918 23:25:48.191147 1 onnxruntime.cc:2568] TRITONBACKEND_ModelInitialize: sid-minibert-onnx (version 1)
I0918 23:25:48.193377 1 onnxruntime.cc:2611] TRITONBACKEND_ModelInstanceInitialize: sid-minibert-onnx (GPU device 0)
2022-09-18 23:25:49.338904987 [W:onnxruntime:log, tensorrt_execution_provider.h:59 log] [2022-09-18 23:25:49 WARNING] external/onnx-tensorrt/onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
2022-09-18 23:25:49.470422138 [W:onnxruntime:log, tensorrt_execution_provider.h:59 log] [2022-09-18 23:25:49 WARNING] Output type must be INT32 for shape outputs
2022-09-18 23:25:49.967791996 [W:onnxruntime:log, tensorrt_execution_provider.h:59 log] [2022-09-18 23:25:49 WARNING] Output type must be INT32 for shape outputs
I0918 23:25:49.972195 1 model_repository_manager.cc:1345] successfully loaded 'sid-minibert-onnx' version 1

THanks

Hi Davinder,

Thanks for posting. The ai-engine Helm chart deploys Triton using EXPLICIT mode as a default which means that each time the Triton pod restarts the model needs to published to it again.

Thanks,
\Pete