NVIDIA Morpheus runtime error: Model is not ready

saaguinaga · May 12, 2025, 5:07pm

An exception occurs in pipeline:
morpheus/pipeline/pipeline.py

[2025-05-12 15:58:02,163] {morpheus.pipeline.pipeline:407} ERROR - Exception occurred in pipeline. Rethrowing
Traceback (most recent call last):
File “/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/pipeline.py”, line 405, in post_start
await executor.join_async()
File “/opt/conda/envs/morpheus/lib/python3.10/asyncio/base_events.py”, line 649, in run_until_complete
return future.result()
File “/opt/conda/envs/morpheus/lib/python3.10/asyncio/tasks.py”, line 650, in _wrap_awaitable
return (yield from awaitable.await())
RuntimeError: Model is not ready

How do I ensure that the model on ai-engine is ready?
I am using the Morpheus: 24.03.02

saaguinaga · May 12, 2025, 5:48pm

Does the morpheus api require models to be stored in /common/models/

Currently my models sit (on the Tritonserver) at: /common/triton-model-repo/

saaguinaga · May 13, 2025, 2:39pm

I have verified that the models are loaded, and they are ready:
tritonserver --model-repository=/common/triton-model-repo to list model status; one gets:

+--------------------------------+---------+--------+
| Model                          | Version | Status |
+--------------------------------+---------+--------+
| mymodel1                       | 1       | READY  |
| mymodel2                       | 1       | READY  |
| ... etc                        | 1       | READY  |
+--------------------------------+---------+--------+

saljain · May 13, 2025, 7:30pm

Hi,

Thank you for reaching out and inquiring about the issue you’re observing.

I looked into the issue and discussed it with one of our developers. Could you please help us understand a few things:

Is Triton running in a Docker container and Morpheus in another container?
Have you checked if Morpheus can communicate with Triton? If not, you can verify their communication by running the following commands (replacing localhost:8000 with the URL of the Triton server):
- curl -v "localhost:8000/v2/health/live"
- curl "localhost:8000/v2/models/<model name>/config" (replace <model name> with the name of your model)
Is the configured URL for Triton correct, and does the model name match?
Does the pipeline you’re using include the Triton stage? The traceback doesn’t seem to be Triton-specific.

saaguinaga · May 13, 2025, 9:00pm

Triton is runnon on a EKS cluster. Each is in its own container.
Yes. Consistent with the output I posted above showing that the model is ready: I get “< HTTP/1.1 200 OK” and I get output like:
eshold":0,“eager_batching”:false},“instance_group”:[{“name”:“nd_rf_severity_regressor_2.0.0”,“kind”:“KIND_GPU”,“count”:1,“gpus”:[0,1,2,3],“secondary_devices”:,“profile”:,“passive”:false,“host_policy”:“”}],“default_model_filename”:“”,“cc_model_filenames”:{},“metric_tags”:{},“parameters”:{“model_type”:{“string_value”:“treelite_checkpoint”},“output_class”:{“string_value”:“false”}},“model_warmup”:}

after “curl ai-engine:8000/v2/models/mymodelname/config.”

I will check if the Triton Stage is included.

saaguinaga · May 13, 2025, 9:10pm

Here are the lines in our entire code base with references to the Triton stage.

./morpheus_pipeline/morpheus_pipeline_builder.py:from morpheus.stages.inference.triton_inference_stage import TritonInferenceStage
./morpheus_pipeline/morpheus_pipeline_builder.py: inference_stage = TritonInferenceStage(

Followed by:

./apps/morpheus_cybersphere/run-morpheus-cybersphere.py: ).add_inference_stage(
./apps/morpheus_cybersphere/run-morpheus-cybersphere.py: pipeline_builder.add_inference_stage(

In my log file, I see this which might indicate that we are loading the models.

2025-05-14 15:39:36,663 [INFO)] - Added stage: <inference-18; TritonInferenceStage(model_name=ourmodelname, server_url=ai-engine:8000, force_convert_inputs=True, use_shared_memory=True, needs_logits=None, inout_mapping=None, input_mapping=None, output_mapping=None)>

saaguinaga · May 13, 2025, 9:33pm

Hi! Notice below that while processing message for InferenceClientStage is where the issue stems from; after loading a batch of data.

Pipeline Throughput: 0 events [00:12, ? events/s][2025-05-13 21:28:25,123] {morpheus_pipeline.stages.ad_data_loading_left_shift_stage:119} INFO - AD data loading runtime: 0 minutes, 1 seconds
WARNING: Logging before InitGoogleLogging() is written to STDERR
W20250513 21:28:25.165555 13789 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
W20250513 21:28:25.167411 13788 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
W20250513 21:28:25.271406 13788 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
W20250513 21:28:25.271649 13789 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
W20250513 21:28:25.486009 13788 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
W20250513 21:28:25.486114 13789 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
Pipeline Throughput: 0 events [00:13, ? events/s]W20250513 21:28:25.890973 13788 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
W20250513 21:28:25.891028 13789 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
Pipeline Throughput: 0 events [00:14, ? events/s]W20250513 21:28:26.694842 13788 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
W20250513 21:28:26.694903 13789 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
Pipeline Throughput: 0 events [00:15, ? events/s]W20250513 21:28:28.298821 13788 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
W20250513 21:28:28.298847 13789 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
Pipeline Throughput: 0 events [00:18, ? events/s]W20250513 21:28:31.503276 13788 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
W20250513 21:28:31.503441 13789 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
Pipeline Throughput: 0 events [00:22, ? events/s]W20250513 21:28:35.507351 13789 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
W20250513 21:28:35.507361 13788 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
Pipeline Throughput: 0 events [00:26, ? events/s]W20250513 21:28:39.511365 13789 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
W20250513 21:28:39.511382 13788 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
Pipeline Throughput: 0 events [00:30, ? events/s]W20250513 21:28:43.515393 13788 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
W20250513 21:28:43.515478 13789 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
Pipeline Throughput: 0 events [00:34, ? events/s]E20250513 21:28:47.522738 13788 runnable.hpp:112] /main/inference-5; rank: 0; size: 1; tid: 140020180534848 Unhandled exception occurred. Rethrowing
E20250513 21:28:47.522755 13789 runnable.hpp:112] /main/inference-6; rank: 0; size: 1; tid: 140020170044992 Unhandled exception occurred. Rethrowing
E20250513 21:28:47.522769 13788 context.cpp:124] /main/inference-5; rank: 0; size: 1; tid: 140020180534848: set_exception issued; issuing kill to current runnable. Exception msg: RuntimeError: Model is not ready

saljain · May 14, 2025, 6:45pm

Thank you for the updates, please can you check the following pieces requested below as well:

Double check that you are running 24.03.02 this point release included a bug fix for the TritonInferenceStage
Ensure that the config parameters being passed to the TritonInferenceStage match the model config , specifically the values of the Config.model_max_batch_size , Config.feature_length
Check to see if the expected input type matches that of the input tensor (try setting force_convert_inputs=True for the TritonInferenceStage . Also double-check the inout_mapping to ensure the input tensors and output tensors match what the model is expecting.
Once you see the errors being reported in the TritonInferenceStage are there associated errors being emitted from the Triton Inference Server?
Try running with Config.num_threads=1 , does this avoid the issue and/or does this result in better error messages?
If all else fails try running in Python mode by setting morpheus.config.CppConfig.set_should_use_cpp(True) and/or setting the environment variable MORPHEUS_NO_CPP=1 . This probably won’t fix the problem but could result in a better error message.

saaguinaga · June 2, 2025, 4:12pm

Confirming:

Yes, the version of morpheus is 24.03.02

Python 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) [GCC 12.3.0] on linux

Type “help”, “copyright”, “credits” or “license” for more information.

import morpheus

morpheus.version

‘24.03.02’

Regarding these variables: Config.model_max_batch_size , Config.feature_length

In our env file, we set the value of max_batch_size to 1024, but the .pbtxt file shows the default value of 524288 so this is not a match.

→ I can change the pbtxt file to show 1024 and test

feature_length is not set in the env file, but in the code a default value is 32

Is there another place where I should check?

I do have this variable set: “./morpheus_pipeline/morpheus_pipeline_builder.py: force_convert_inputs=True, “

Where can I set: inout_mapping ?

The code gets stuck after adding the first stages

pipeline_builder.add_broadcast_stage(
stage_name=“broadcast-feature-engineering”,
branch_names=[ad_branch, nd_branch],
output_type=LeftShiftMessageMeta,
).add_stage(
stage=ADDataLoadingLeftShiftStage(pipeline_builder.config, suffix=suffix),
monitor_description=“AD Data Loading (Left shift) Throughput”,
stage_branch_name=ad_branch,
).add_stage(
stage=NDDataLoadingLeftShiftStage(pipeline_builder.config),
monitor_description=“ND Data Loading Throughput”,
stage_branch_name=nd_branch,
)

And this is the output I get on the terminal:

WARNING: Logging before InitGoogleLogging() is written to STDERR
attempting retry.
W20250521 16:14:24.913244 41301 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
Pipeline Throughput: 0 events [04:07, ? events/s]W20250521 16:14:24.913529 41300 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
W20250521 16:14:25.717754 41301 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
Pipeline Throughput: 0 events [04:09, ? events/s]W20250521 16:14:27.322055 41300 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
W20250521 16:14:27.322139 41301 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
Pipeline Throughput: 0 events [04:12, ? events/s]W20250521 16:14:30.526607 41301 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
W20250521 16:14:30.526727 41300 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
Pipeline Throughput: 0 events [04:16, ? events/s]W20250521 16:14:34.531513 41300 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
W20250521 16:14:34.531800 41301 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
Pipeline Throughput: 0 events [04:20, ? events/s]W20250521 16:14:38.535614 41300 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
W20250521 16:14:38.535822 41301 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
Pipeline Throughput: 0 events [04:24, ? events/s]W20250521 16:14:42.540000 41301 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
W20250521 16:14:42.540230 41300 inference_client_stage.cpp:255] Exception while processing message for InferenceClientStage, attempting retry.
Pipeline Throughput: 0 events [04:28, ? events/s]E20250521 16:14:46.546599 41300 runnable.hpp:112] /main/inference-5; rank: 0; size: 1; tid: 140479834293824 Unhandled exception occurred. Rethrowing
E20250521 16:14:46.546630 41300 context.cpp:124] /main/inference-5; rank: 0; size: 1; tid: 140479834293824: set_exception issued; issuing kill to current runnable. 
Exception msg: RuntimeError: **Model is not ready**

I confirmed that we have >500 records of input data.

I confirmed that kubectl logs shows that the models are ready:

kubectl logs ai-engine-f599fbdd8-nv6nj -n csgdev -c morpheus-ai-engine | grep successfully
I0513 12:21:09.332542 1 model_lifecycle.cc:835] successfully loaded ‘ad_fi_1.0.0’
I0513 12:21:43.072444 1 model_lifecycle.cc:835] successfully loaded ‘nd_classifier_2.0.0’
I0513 12:21:58.483691 1 model_lifecycle.cc:835] successfully loaded ‘nd_embedder_2.0.0’
I0513 12:22:32.098439 1 model_lifecycle.cc:835] successfully loaded ‘nd_rf_severity_regressor_2.0.0’

saljain · June 3, 2025, 1:57am

In our env file, we set the value of max_batch_size to 1024, but the .pbtxt file shows the default value of 524288 so this is not a match. You will change the default value to 1024 to test.

As long as max_batch_size is less than the value in the pbtxt file, then this should be fine.

feature_length is not set in the env file, but in the code a default value is 32. Is there another place you should check?

The feature_length needs to match the dimensions of the model inputs.

If you run:

curl "<triton url>/v2/models/<model name>/config"

Under the input key should be a list of the inputs each specifying a dims this is where this value should from from.
This is set in the feature_length attribute of the morpheus.config.Config object.

Based on the statements above, it sounds like you have a custom pipeline which is mapping environment variables on to attributes in the Python API. We are not sure if you are using the TritonInferenceStage directly or if it is being wrapped somehow in your code (it’s OK if it is, it just means that you might need to do some digging).

The inout_mapping is a constructor argument for the TritonInferenceStage this argument is only needed if the tensor names in the Morpheus messages are different than the field names used by the model (obtained by the curl command).

saaguinaga · June 11, 2025, 9:31pm

@saljain
How do I find the triton url?

when I do hostname, I get this ai-engine-f599fbdd8-nv6nj

Which matches the name of the pod command:

kubectl get pods -n csgdev
NAME                             READY   STATUS    RESTARTS   AGE
ai-engine-f599fbdd8-nv6nj        2/2     Running   0          38d

I want to get at least the DenseNet model working. Problem is that in this ai-engine pod, I don’t have the tools for ‘image_client’

I tried downloading it from here, but binaries don’t work.

if I had the triton url I could use this from a docker container that has the image_client tool.
But i need to set the -u flag: image_client -u <URL for inference service> -m densenet_onnx -c 3 -s INCEPTION /workspace/images/mug.jpg

Again, on the pod I can get this with this command: tritonserver --model-repository=/common/triton-model-repo/

I0611 21:22:00.778785 15869 server.cc:677] 
+--------------------------------+---------+-----------------------------------------------------------------------------------------------------+
| Model                          | Version | Status                                                                                              |
+--------------------------------+---------+-----------------------------------------------------------------------------------------------------+
| LM                             | 1       | READY                                                                                               |
| ad_fi_1.0.0                    | 1       | READY                                                                                               |
| ad_full_1.0.0                  | 1       | READY                                                                                               |
| ad_raad_1.0.0                  | 1       | READY                                                                                               |
| nd_classifier_2.0.0            | 1       | READY                                                                                               |
| nd_embedder_2.0.0              | 1       | READY                                                                                               |
| nd_rf_severity_regressor_2.0.0 | 1       | READY                                                                                               |
| rware_binclass                 | 1       | UNAVAILABLE: Invalid argument: unexpected inference output 'output__0', allowed outputs are: output |
| rware_multiclass               | 1       | UNAVAILABLE: Invalid argument: unexpected inference output 'output__0', allowed outputs are: output |
+--------------------------------+---------+-----------------------------------------------------------------------------------------------------+

I0611 21:22:00.856269 15869 metrics.cc:877] Collecting metrics for GPU 0: Tesla T4
I0611 21:22:00.856304 15869 metrics.cc:877] Collecting metrics for GPU 1: Tesla T4
I0611 21:22:00.856320 15869 metrics.cc:877] Collecting metrics for GPU 2: Tesla T4
I0611 21:22:00.856336 15869 metrics.cc:877] Collecting metrics for GPU 3: Tesla T4
I0611 21:22:00.874144 15869 metrics.cc:770] Collecting CPU metrics
I0611 21:22:00.874312 15869 tritonserver.cc:2538]

But from the Bastion host, this is all I get:

kubectl logs ai-engine-f599fbdd8-nv6nj -n csgdev -c morpheus-ai-engine | grep successfully

I0513 12:21:09.332542 1 model_lifecycle.cc:835] successfully loaded 'ad_fi_1.0.0'
I0513 12:21:43.072444 1 model_lifecycle.cc:835] successfully loaded 'nd_classifier_2.0.0'
I0513 12:21:58.483691 1 model_lifecycle.cc:835] successfully loaded 'nd_embedder_2.0.0'
I0513 12:22:32.098439 1 model_lifecycle.cc:835] successfully loaded 'nd_rf_severity_regressor_2.0.0'

Only 4 models are ready,
is it because the triton server only has 4 gpus?

saaguinaga · June 11, 2025, 9:43pm

Also, within in the ai-engine pod, this command fails:
root@ai-engine-f599fbdd8-nv6nj:/opt/tritonserver# curl localhost:8000/v2/models/LM/config
{“error”:“Request for unknown model: ‘LM’ is not found”}

On the other hand for one of the 'successfully loaded models, this is what I get:

{"error":"Not Found"}root@ai-engine-f599fbdd8-nv6nj:/common/triton-model-repo# curl localhost:8000/v2/models/ad_fi_1.0.0/config
{"name":"ad_fi_1.0.0","platform":"onnxruntime_onnx","backend":"onnxruntime","runtime":"","version_policy":{"latest":{"num_versions":1}},"max_batch_size":524288,"input":[{"name":"input__0","data_type":"TYPE_FP32","format":"FORMAT_NONE","dims":[23],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false}],"output":[{"name":"output__0","data_type":"TYPE_FP32","dims":[17],"label_filename":"","is_shape_tensor":false}],"batch_input":[],"batch_output":[],"optimization":{"priority":"PRIORITY_DEFAULT","input_pinned_memory":{"enable":true},"output_pinned_memory":{"enable":true},"gather_kernel_buffer_threshold":0,"eager_batching":false},"instance_group":[{"name":"ad_fi_1.0.0","kind":"KIND_GPU","count":1,"gpus":[0,1,2,3],"secondary_devices":[],"profile":[],"passive":false,"host_policy":""}],"default_model_filename":"model.onnx","cc_model_filenames":{},"metric_tags":{},"parameters":{},"model_warmup":[]}root@ai-engine-f599fbdd8-nv6nj:/common/triton-model-repo#

Here is the content of this folder: :/common/triton-model-repo# ls

LM  ad_fi_1.0.0  ad_full_1.0.0  ad_fulls_1.0.0  ad_raad_1.0.0  nd_classifier_2.0.0  nd_embedder_2.0.0  nd_rf_severity_regressor_2.0.0  rware_binclass  rware_multiclass  s3workinglol.txt

any idea what else I can do? is ‘LM’ the wrong name?

saljain · June 16, 2025, 3:48pm

The issue seems to be that the Triton Inference Server isn’t loading all of your models. The key indicator is

UNAVAILABLE: Invalid argument: unexpected inference output 'output__0', allowed outputs are: output

This is preventing the rware_binclass and rware_multiclass models from loading. This is not an error in Morpheus as Morpheus correctly reports that the Model is not ready

Regarding the error coming from Triton , unexpected inference output 'output__0', allowed outputs are: output it seems like the model isn’t matching up with the contents of the config.pbtxt file. We have reached out to the Triton team to get a better understanding.

saljain · June 16, 2025, 8:45pm

@saaguinaga Please let us know which version of the Triton Inference Server you are using.

saaguinaga · June 17, 2025, 3:11pm

Saljain:

Thank you.
Two points:

See below an answer to your question. Let me know if that is okay.
I want to have a baseline. I want to have an out of the box (nvidia) model that I can test and get working like the 3 models Deloitte is trying to deploy to Morpheus. Assume I know nothing, where do I start? Here: https://catalog.ngc.nvidia.com/orgs/nvidia/teams/morpheus/containers/morpheus and go to git; clone it; and deploy one of these proven models first ? Your thoughts?

saaguinaga · July 9, 2025, 7:32pm

@saljain the version of Triton inference server is 2.44.0
Let me know what else to try.

Topic		Replies	Views
Getting off the ground: first pipeline using morpheus Cybersecurity	2	921	February 4, 2022
Triton did not update the model after users added a new model into model_repository \| There is nothing on localhost:8000/api/status Triton Inference Server (archived)	3	2563	October 12, 2021
Morpheus cloud deployment "unable to poll from model repository" Cybersecurity	3	585	February 1, 2024
Model shown READY but can't be found General inference-server-triton	16	2643	November 15, 2022
Triton inference server is sending back "HTTP/1.1 400 Bad Request" TAO Toolkit	6	3517	October 12, 2021
Morpheus and MRC are quite difficult to get examples running TensorRT	1	509	September 20, 2023
Failed to load 'sid-minibert-onnx', no version is available Cybersecurity	1	857	September 20, 2022
the TensortRt server can`t run a pretrained model after converting from onnx to caffe2 TensorRT	7	1305	February 18, 2019
TensorRT Inference Server - AWS S3 Model repository Triton Inference Server (archived)	0	597	May 23, 2019
Triton server for squad model on P100 with TensorRT 6.0 Triton Inference Server (archived)	0	906	June 23, 2020

NVIDIA Morpheus runtime error: Model is not ready

Related topics