NVIDIA Morpheus runtime error: Model is not ready

An exception occurs in pipeline:
morpheus/pipeline/pipeline.py

[2025-05-12 15:58:02,163] {morpheus.pipeline.pipeline:407} ERROR - Exception occurred in pipeline. Rethrowing
Traceback (most recent call last):
File “/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/pipeline.py”, line 405, in post_start
await executor.join_async()
File “/opt/conda/envs/morpheus/lib/python3.10/asyncio/base_events.py”, line 649, in run_until_complete
return future.result()
File “/opt/conda/envs/morpheus/lib/python3.10/asyncio/tasks.py”, line 650, in _wrap_awaitable
return (yield from awaitable.await())
RuntimeError: Model is not ready

How do I ensure that the model on ai-engine is ready?
I am using the Morpheus: 24.03.02

Does the morpheus api require models to be stored in /common/models/

Currently my models sit (on the Tritonserver) at: /common/triton-model-repo/

I have verified that the models are loaded, and they are ready:
tritonserver --model-repository=/common/triton-model-repo to list model status; one gets:

+--------------------------------+---------+--------+
| Model                          | Version | Status |
+--------------------------------+---------+--------+
| mymodel1                       | 1       | READY  |
| mymodel2                       | 1       | READY  |
| ... etc                        | 1       | READY  |
+--------------------------------+---------+--------+

Hi,

Thank you for reaching out and inquiring about the issue you’re observing.

I looked into the issue and discussed it with one of our developers. Could you please help us understand a few things:

  • Is Triton running in a Docker container and Morpheus in another container?
  • Have you checked if Morpheus can communicate with Triton? If not, you can verify their communication by running the following commands (replacing localhost:8000 with the URL of the Triton server):
    • curl -v "localhost:8000/v2/health/live"
    • curl "localhost:8000/v2/models/<model name>/config" (replace <model name> with the name of your model)
  • Is the configured URL for Triton correct, and does the model name match?
  • Does the pipeline you’re using include the Triton stage? The traceback doesn’t seem to be Triton-specific.

Triton is runnon on a EKS cluster. Each is in its own container.
Yes. Consistent with the output I posted above showing that the model is ready: I get “< HTTP/1.1 200 OK” and I get output like:
eshold":0,“eager_batching”:false},“instance_group”:[{“name”:“nd_rf_severity_regressor_2.0.0”,“kind”:“KIND_GPU”,“count”:1,“gpus”:[0,1,2,3],“secondary_devices”:,“profile”:,“passive”:false,“host_policy”:“”}],“default_model_filename”:“”,“cc_model_filenames”:{},“metric_tags”:{},“parameters”:{“model_type”:{“string_value”:“treelite_checkpoint”},“output_class”:{“string_value”:“false”}},“model_warmup”:}

after “curl ai-engine:8000/v2/models/mymodelname/config.”

I will check if the Triton Stage is included.

Here are the lines in our entire code base with references to the Triton stage.

./morpheus_pipeline/morpheus_pipeline_builder.py:from morpheus.stages.inference.triton_inference_stage import TritonInferenceStage
./morpheus_pipeline/morpheus_pipeline_builder.py: inference_stage = TritonInferenceStage(

Followed by:

./apps/morpheus_cybersphere/run-morpheus-cybersphere.py: ).add_inference_stage(
./apps/morpheus_cybersphere/run-morpheus-cybersphere.py: pipeline_builder.add_inference_stage(

In my log file, I see this which might indicate that we are loading the models.

2025-05-14 15:39:36,663 [INFO)] - Added stage: <inference-18; TritonInferenceStage(model_name=ourmodelname, server_url=ai-engine:8000, force_convert_inputs=True, use_shared_memory=True, needs_logits=None, inout_mapping=None, input_mapping=None, output_mapping=None)>

Hi! Notice below that while processing message for InferenceClientStage is where the issue stems from; after loading a batch of data.

Pipeline Throughput: 0 events [00:12, ? events/s][2025-05-13 21:28:25,123] {morpheus_pipeline.stages.ad_data_loading_left_shift_stage:119} INFO - AD data loading runtime: 0 minutes, 1 seconds
WARNING: Logging before InitGoogleLogging() is written to STDERR
W20250513 21:28:25.165555 13789 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
W20250513 21:28:25.167411 13788 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
W20250513 21:28:25.271406 13788 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
W20250513 21:28:25.271649 13789 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
W20250513 21:28:25.486009 13788 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
W20250513 21:28:25.486114 13789 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
Pipeline Throughput: 0 events [00:13, ? events/s]W20250513 21:28:25.890973 13788 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
W20250513 21:28:25.891028 13789 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
Pipeline Throughput: 0 events [00:14, ? events/s]W20250513 21:28:26.694842 13788 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
W20250513 21:28:26.694903 13789 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
Pipeline Throughput: 0 events [00:15, ? events/s]W20250513 21:28:28.298821 13788 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
W20250513 21:28:28.298847 13789 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
Pipeline Throughput: 0 events [00:18, ? events/s]W20250513 21:28:31.503276 13788 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
W20250513 21:28:31.503441 13789 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
Pipeline Throughput: 0 events [00:22, ? events/s]W20250513 21:28:35.507351 13789 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
W20250513 21:28:35.507361 13788 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
Pipeline Throughput: 0 events [00:26, ? events/s]W20250513 21:28:39.511365 13789 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
W20250513 21:28:39.511382 13788 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
Pipeline Throughput: 0 events [00:30, ? events/s]W20250513 21:28:43.515393 13788 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
W20250513 21:28:43.515478 13789 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
Pipeline Throughput: 0 events [00:34, ? events/s]E20250513 21:28:47.522738 13788 runnable.hpp:112] /main/inference-5; rank: 0; size: 1; tid: 140020180534848 Unhandled exception occurred. Rethrowing
E20250513 21:28:47.522755 13789 runnable.hpp:112] /main/inference-6; rank: 0; size: 1; tid: 140020170044992 Unhandled exception occurred. Rethrowing
E20250513 21:28:47.522769 13788 context.cpp:124] /main/inference-5; rank: 0; size: 1; tid: 140020180534848: set_exception issued; issuing kill to current runnable. Exception msg: RuntimeError: Model is not ready

Thank you for the updates, please can you check the following pieces requested below as well:

  • Double check that you are running 24.03.02 this point release included a bug fix for the TritonInferenceStage
  • Ensure that the config parameters being passed to the TritonInferenceStage match the model config , specifically the values of the Config.model_max_batch_size , Config.feature_length
  • Check to see if the expected input type matches that of the input tensor (try setting force_convert_inputs=True for the TritonInferenceStage . Also double-check the inout_mapping to ensure the input tensors and output tensors match what the model is expecting.
  • Once you see the errors being reported in the TritonInferenceStage are there associated errors being emitted from the Triton Inference Server?
  • Try running with Config.num_threads=1 , does this avoid the issue and/or does this result in better error messages?
  • If all else fails try running in Python mode by setting morpheus.config.CppConfig.set_should_use_cpp(True) and/or setting the environment variable MORPHEUS_NO_CPP=1 . This probably won’t fix the problem but could result in a better error message.

Confirming:

  1. Yes, the version of morpheus is 24.03.02

Python 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) [GCC 12.3.0] on linux

Type “help”, “copyright”, “credits” or “license” for more information.

import morpheus

morpheus.version

‘24.03.02’

  1. Regarding these variables: Config.model_max_batch_size , Config.feature_length
  • In our env file, we set the value of max_batch_size to 1024, but the .pbtxt file shows the default value of 524288 so this is not a match.

→ I can change the pbtxt file to show 1024 and test

  • feature_length is not set in the env file, but in the code a default value is 32

Is there another place where I should check?

I do have this variable set: “./morpheus_pipeline/morpheus_pipeline_builder.py: force_convert_inputs=True, “

Where can I set: inout_mapping ?

The code gets stuck after adding the first stages

pipeline_builder.add_broadcast_stage(
stage_name=“broadcast-feature-engineering”,
branch_names=[ad_branch, nd_branch],
output_type=LeftShiftMessageMeta,
).add_stage(
stage=ADDataLoadingLeftShiftStage(pipeline_builder.config, suffix=suffix),
monitor_description=“AD Data Loading (Left shift) Throughput”,
stage_branch_name=ad_branch,
).add_stage(
stage=NDDataLoadingLeftShiftStage(pipeline_builder.config),
monitor_description=“ND Data Loading Throughput”,
stage_branch_name=nd_branch,
)

And this is the output I get on the terminal:

WARNING: Logging before InitGoogleLogging() is written to STDERR
attempting retry.
W20250521 16:14:24.913244 41301 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
Pipeline Throughput: 0 events [04:07, ? events/s]W20250521 16:14:24.913529 41300 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
W20250521 16:14:25.717754 41301 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
Pipeline Throughput: 0 events [04:09, ? events/s]W20250521 16:14:27.322055 41300 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
W20250521 16:14:27.322139 41301 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
Pipeline Throughput: 0 events [04:12, ? events/s]W20250521 16:14:30.526607 41301 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
W20250521 16:14:30.526727 41300 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
Pipeline Throughput: 0 events [04:16, ? events/s]W20250521 16:14:34.531513 41300 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
W20250521 16:14:34.531800 41301 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
Pipeline Throughput: 0 events [04:20, ? events/s]W20250521 16:14:38.535614 41300 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
W20250521 16:14:38.535822 41301 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
Pipeline Throughput: 0 events [04:24, ? events/s]W20250521 16:14:42.540000 41301 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
W20250521 16:14:42.540230 41300 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
Pipeline Throughput: 0 events [04:28, ? events/s]E20250521 16:14:46.546599 41300 runnable.hpp:112] /main/inference-5; rank: 0; size: 1; tid: 140479834293824 Unhandled exception occurred. Rethrowing
E20250521 16:14:46.546630 41300 context.cpp:124] /main/inference-5; rank: 0; size: 1; tid: 140479834293824: set_exception issued; issuing kill to current runnable. 
Exception msg: RuntimeError: **Model is not ready**

I confirmed that we have >500 records of input data.

I confirmed that kubectl logs shows that the models are ready:

kubectl logs ai-engine-f599fbdd8-nv6nj -n csgdev -c morpheus-ai-engine | grep successfully
I0513 12:21:09.332542 1 model_lifecycle.cc:835] successfully loaded ‘ad_fi_1.0.0’
I0513 12:21:43.072444 1 model_lifecycle.cc:835] successfully loaded ‘nd_classifier_2.0.0’
I0513 12:21:58.483691 1 model_lifecycle.cc:835] successfully loaded ‘nd_embedder_2.0.0’
I0513 12:22:32.098439 1 model_lifecycle.cc:835] successfully loaded ‘nd_rf_severity_regressor_2.0.0’

  • In our env file, we set the value of max_batch_size to 1024, but the .pbtxt file shows the default value of 524288 so this is not a match. You will change the default value to 1024 to test.

As long as max_batch_size is less than the value in the pbtxt file, then this should be fine.

  • feature_length is not set in the env file, but in the code a default value is 32. Is there another place you should check?

The feature_length needs to match the dimensions of the model inputs.

If you run:

curl "<triton url>/v2/models/<model name>/config"

Under the input key should be a list of the inputs each specifying a dims this is where this value should from from.
This is set in the feature_length attribute of the morpheus.config.Config object.

Based on the statements above, it sounds like you have a custom pipeline which is mapping environment variables on to attributes in the Python API. We are not sure if you are using the TritonInferenceStage directly or if it is being wrapped somehow in your code (it’s OK if it is, it just means that you might need to do some digging).

The inout_mapping is a constructor argument for the TritonInferenceStage this argument is only needed if the tensor names in the Morpheus messages are different than the field names used by the model (obtained by the curl command).

@saljain
How do I find the triton url?

when I do hostname, I get this ai-engine-f599fbdd8-nv6nj

Which matches the name of the pod command:

kubectl get pods -n csgdev
NAME                             READY   STATUS    RESTARTS   AGE
ai-engine-f599fbdd8-nv6nj        2/2     Running   0          38d

I want to get at least the DenseNet model working. Problem is that in this ai-engine pod, I don’t have the tools for ‘image_client’

I tried downloading it from here, but binaries don’t work.


if I had the triton url I could use this from a docker container that has the image_client tool.
But i need to set the -u flag: image_client -u <URL for inference service> -m densenet_onnx -c 3 -s INCEPTION /workspace/images/mug.jpg


Again, on the pod I can get this with this command: tritonserver --model-repository=/common/triton-model-repo/

I0611 21:22:00.778785 15869 server.cc:677] 
+--------------------------------+---------+-----------------------------------------------------------------------------------------------------+
| Model                          | Version | Status                                                                                              |
+--------------------------------+---------+-----------------------------------------------------------------------------------------------------+
| LM                             | 1       | READY                                                                                               |
| ad_fi_1.0.0                    | 1       | READY                                                                                               |
| ad_full_1.0.0                  | 1       | READY                                                                                               |
| ad_raad_1.0.0                  | 1       | READY                                                                                               |
| nd_classifier_2.0.0            | 1       | READY                                                                                               |
| nd_embedder_2.0.0              | 1       | READY                                                                                               |
| nd_rf_severity_regressor_2.0.0 | 1       | READY                                                                                               |
| rware_binclass                 | 1       | UNAVAILABLE: Invalid argument: unexpected inference output 'output__0', allowed outputs are: output |
| rware_multiclass               | 1       | UNAVAILABLE: Invalid argument: unexpected inference output 'output__0', allowed outputs are: output |
+--------------------------------+---------+-----------------------------------------------------------------------------------------------------+

I0611 21:22:00.856269 15869 metrics.cc:877] Collecting metrics for GPU 0: Tesla T4
I0611 21:22:00.856304 15869 metrics.cc:877] Collecting metrics for GPU 1: Tesla T4
I0611 21:22:00.856320 15869 metrics.cc:877] Collecting metrics for GPU 2: Tesla T4
I0611 21:22:00.856336 15869 metrics.cc:877] Collecting metrics for GPU 3: Tesla T4
I0611 21:22:00.874144 15869 metrics.cc:770] Collecting CPU metrics
I0611 21:22:00.874312 15869 tritonserver.cc:2538] 

But from the Bastion host, this is all I get:

kubectl logs ai-engine-f599fbdd8-nv6nj -n csgdev -c morpheus-ai-engine | grep successfully

I0513 12:21:09.332542 1 model_lifecycle.cc:835] successfully loaded 'ad_fi_1.0.0'
I0513 12:21:43.072444 1 model_lifecycle.cc:835] successfully loaded 'nd_classifier_2.0.0'
I0513 12:21:58.483691 1 model_lifecycle.cc:835] successfully loaded 'nd_embedder_2.0.0'
I0513 12:22:32.098439 1 model_lifecycle.cc:835] successfully loaded 'nd_rf_severity_regressor_2.0.0'

Only 4 models are ready,
is it because the triton server only has 4 gpus?

Also, within in the ai-engine pod, this command fails:
root@ai-engine-f599fbdd8-nv6nj:/opt/tritonserver# curl localhost:8000/v2/models/LM/config
{“error”:“Request for unknown model: ‘LM’ is not found”}

On the other hand for one of the 'successfully loaded models, this is what I get:

{"error":"Not Found"}root@ai-engine-f599fbdd8-nv6nj:/common/triton-model-repo# curl localhost:8000/v2/models/ad_fi_1.0.0/config
{"name":"ad_fi_1.0.0","platform":"onnxruntime_onnx","backend":"onnxruntime","runtime":"","version_policy":{"latest":{"num_versions":1}},"max_batch_size":524288,"input":[{"name":"input__0","data_type":"TYPE_FP32","format":"FORMAT_NONE","dims":[23],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false}],"output":[{"name":"output__0","data_type":"TYPE_FP32","dims":[17],"label_filename":"","is_shape_tensor":false}],"batch_input":[],"batch_output":[],"optimization":{"priority":"PRIORITY_DEFAULT","input_pinned_memory":{"enable":true},"output_pinned_memory":{"enable":true},"gather_kernel_buffer_threshold":0,"eager_batching":false},"instance_group":[{"name":"ad_fi_1.0.0","kind":"KIND_GPU","count":1,"gpus":[0,1,2,3],"secondary_devices":[],"profile":[],"passive":false,"host_policy":""}],"default_model_filename":"model.onnx","cc_model_filenames":{},"metric_tags":{},"parameters":{},"model_warmup":[]}root@ai-engine-f599fbdd8-nv6nj:/common/triton-model-repo# 

Here is the content of this folder: :/common/triton-model-repo# ls

LM  ad_fi_1.0.0  ad_full_1.0.0  ad_fulls_1.0.0  ad_raad_1.0.0  nd_classifier_2.0.0  nd_embedder_2.0.0  nd_rf_severity_regressor_2.0.0  rware_binclass  rware_multiclass  s3workinglol.txt

any idea what else I can do? is ‘LM’ the wrong name?

The issue seems to be that the Triton Inference Server isn’t loading all of your models. The key indicator is

UNAVAILABLE: Invalid argument: unexpected inference output 'output__0', allowed outputs are: output

This is preventing the rware_binclass and rware_multiclass models from loading. This is not an error in Morpheus as Morpheus correctly reports that the Model is not ready

Regarding the error coming from Triton , unexpected inference output 'output__0', allowed outputs are: output it seems like the model isn’t matching up with the contents of the config.pbtxt file. We have reached out to the Triton team to get a better understanding.

@saaguinaga Please let us know which version of the Triton Inference Server you are using.

Saljain:

Thank you.
Two points:

  1. See below an answer to your question. Let me know if that is okay.
  2. I want to have a baseline. I want to have an out of the box (nvidia) model that I can test and get working like the 3 models Deloitte is trying to deploy to Morpheus. Assume I know nothing, where do I start? Here: https://catalog.ngc.nvidia.com/orgs/nvidia/teams/morpheus/containers/morpheus and go to git; clone it; and deploy one of these proven models first ? Your thoughts?

@saljain the version of Triton inference server is 2.44.0
Let me know what else to try.