Passing data to inference errors undecipherable

saaguinaga · June 13, 2025, 5:15pm

with Morpheus and Nv Triton I am running a model from the sdk-cli-helper pod (in k8) and by the time data is passed to the triton server … I get an error that is hard to decipher.

See below
Undecipherable error messages:

File Source Rate for Athena Source stage: 0 m2025-06-13 17:06:10,763 [INFO] - ====Building Segment Complete!====essages [00:00, ? messages/s]
py inferencerate: 0 messages [00:00, ? messages/s]
2025-06-13 17:06:10,780 [INFO] - LM, ? inferences/s]
WARNING: Logging before InitGoogleLogging() is written to STDERR
E20250613 17:06:10.781409 11627 operators.cpp:78] Python occurred during full node subscription. Error: SystemExit: None0:00, ? messages/s]
RAAD rate: 0 messages [00:00, ? messages/s]
At:ter rate: 0 messages [00:00, ? messages/s]
  /opt/conda/envs/morpheus/lib/python3.10/_sitebuiltins.py(26): __call__
  /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/stages/inference/triton_inference_stage.py(483): init
  /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/stages/inference/inference_stage.py(234): py_inference_fn
E20250613 17:06:10.781551 11627 context.cpp:124] /linear_segment_0/inference-6; rank: 0; size: 1; tid: 139822190556736: set_exception issued; issuing kill to current runnable. Exception msg: SystemExit: None

At:
  /opt/conda/envs/morpheus/lib/python3.10/_sitebuiltins.py(26): __call__
  /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/stages/inference/triton_inference_stage.py(483): init
  /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/stages/inference/inference_stage.py(234): py_inference_fn
                                                          2025-06-13 17:06:10,780 [INFO] - left shift data extraction
2025-06-13 17:06:10,812 [INFO] - Current time :'2025-06-13 16:06:10.811'
2025-06-13 17:06:10,812 [INFO] - 10 minutes lookback window :'2025-06-13 15:56:10.811' to '2025-06-13 16:06:10.811'0 messages [00:00, ? messages/s]
2025-06-13 17:06:10,812 [INFO] - 2 days lookback window :'2025-06-11 16:06:10.811' to '2025-06-13 16:06:10.811'
2025-06-13 17:06:10,812 [INFO] - Failed time window :'2025-06-06 15:56:10.811' to '2025-06-13 16:06:10.811'
2025-06-13 17:06:10,813 [INFO] - Fetching feature engineered data for ingestion
**********
fetch lookup data)
2025-06-13 17:06:10,813 [INFO] - Querying database to fetch the relevant data
File Source Rate for Athena Source stage: 0 messages [02025-06-13 17:06:15,134 [INFO] - Created CTAS table "cs-analytics-pipeline"."temp_table_212b023fc19b4648afa1bba9bb231c97"
File Source Rate for Athena Source stage: 0 messages [02025-06-13 17:06:16,350 [INFO] - engineered features: (12, 44)es [00:05, ? messages/s]nferences/s]
2025-06-13 17:06:16,381 [INFO] - Total events in the given time range to analyze : 12
2025-06-13 17:06:16,386 [INFO] - Data extracted with shape: (12, 39)
msg meta with features data frames [00:00, ? messages/s]s]
W20250613 17:06:16.396265 11625 meta.cpp:223] Dataframe is not a cudf dataframe, converting to cudf dataframeplete]: 0 messages [00:00, ? messages/s]
E20250613 17:06:16.627050 11625 segment.cpp:240] /linear_segment_0/data-source-stage-0; rank: 0; size: 1; tid: 139822484141632Error occurred in source. Error msg: AssertionError: message meta test failed

At:
  /common/models/lateral-movement-detection-global/morpheus_pipeline/morpheus_data_source_stage.py(137): _generate_frames
File Source Rate for Athena Source stage[Complete]: 0 messagE20250613 17:06:16.719980 11598 runner.cpp:189] Runner::await_join - an exception was caught while awaiting on one or more contexts/instances - rethrowing
                                                            E20250613 17:06:16.763722 11598 segment_instance.cpp:273] segment::SegmentInstance - an exception was caught while awaiting on one or more nodes - rethrowingges [00:05, ? messages/s]
E20250613 17:06:16.764112 11598 service.cpp:224] Service[segment::SegmentInstance]: caught exception in service_await_join: SystemExit: None

At:
  /opt/conda/envs/morpheus/lib/python3.10/_sitebuiltins.py(26): __call__
  /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/stages/inference/triton_inference_stage.py(483): init
  /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/stages/inference/inference_stage.py(234): py_inference_fn
E20250613 17:06:16.764423 11598 pipeline_instance.cpp:230] pipeline::PipelineInstance - an exception was caught while awaiting on segments - rethrowing
E20250613 17:06:16.764710 11598 service.cpp:224] Service[pipeline::PipelineInstance]: caught exception in service_await_join: SystemExit: None

At:
  /opt/conda/envs/morpheus/lib/python3.10/_sitebuiltins.py(26): __call__
  /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/stages/inference/triton_inference_stage.py(483): init
  /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/stages/inference/inference_stage.py(234): py_inference_fn
E20250613 17:06:16.767711 11598 service.cpp:224] Service[pipeline::Manager]: caught exception in service_await_join: SystemExit: None

At:
  /opt/conda/envs/morpheus/lib/python3.10/_sitebuiltins.py(26): __call__
  /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/stages/inference/triton_inference_stage.py(483): init
  /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/stages/inference/inference_stage.py(234): py_inference_fn
E20250613 17:06:16.768051 11598 service.cpp:224] Service[ExecutorDefinition]: caught exception in service_await_join: SystemExit: None

At:
  /opt/conda/envs/morpheus/lib/python3.10/_sitebuiltins.py(26): __call__
  /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/stages/inference/triton_inference_stage.py(483): init
  /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/stages/inference/inference_stage.py(234): py_inference_fn
File Source Rate for Athena Source stage[Complete]: 0 messages [00:00, ? messages/s]
Deserialize rate[Complete]: 0 messages [00:00, ? messages/s]
Pre-process rate[Complete]: 0 messages [00:00, ? messages/s]
Inference rate[Complete]: 0 inferences [00:00, ? inferences/s]
AddScores LM[Complete]: 0 messages [00:00, ? messages/s]
Add class rate[Complete]: 0 messages [00:00, ? messages/s]
Add class rate[Complete]: 0 messages [00:00, ? messages/s]
RAAD rate[Complete]: 0 messages [00:00, ? messages/s]
Writer rate[Complete]: 0 messages [00:00, ? messages/s]
2025-06-13 17:06:16,783 [INFO] - ====Pipeline Complete====
2025-06-13 17:06:16,786 [INFO] - Remove client Client-b4053af2-4878-11f0-ad4e-c6bbb945e3f9
2025-06-13 17:06:16,789 [INFO] - Received 'close-stream' from tcp://127.0.0.1:47524; closing.
2025-06-13 17:06:16,790 [INFO] - Remove client Client-b4053af2-4878-11f0-ad4e-c6bbb945e3f9
2025-06-13 17:06:16,791 [INFO] - Close client connection: Client-b4053af2-4878-11f0-ad4e-c6bbb945e3f9
2025-06-13 17:06:16,797 [INFO] - Closing Nanny at 'tcp://127.0.0.1:39401'. Reason: nanny-close
2025-06-13 17:06:16,797 [INFO] - Nanny asking worker to close. Reason: nanny-close
2025-06-13 17:06:16,803 [INFO] - Stopping worker at tcp://127.0.0.1:43765. Reason: nanny-close
2025-06-13 17:06:16,807 [INFO] - Received 'close-stream' from tcp://127.0.0.1:47510; closing.
2025-06-13 17:06:16,808 [INFO] - Connection to tcp://127.0.0.1:45811 has been closed.
2025-06-13 17:06:16,808 [INFO] - Remove worker <WorkerState 'tcp://127.0.0.1:43765', name: 0, status: closing, memory: 0, processing: 0> (stimulus_id='handle-worker-cleanup-1749834376.8087003')
2025-06-13 17:06:16,809 [INFO] - Lost all workers
2025-06-13 17:06:17,763 [INFO] - Scheduler closing due to unknown reason...
2025-06-13 17:06:17,765 [INFO] - Scheduler closing all comms

Please provide the following information when requesting support.

Hardware - GPU (A100/A30/T4/V100)
Hardware - CPU
Operating System - AWS Linux Linux sdk-cli-sdk-helper-csgdev-ltm 5.10.234-225.910.amzn2.x86_64 #1 SMP Fri Feb 14 16:52:40 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Riva Version
TLT Version (if relevant)
How to reproduce the issue ? (This is for errors. Please share the command and the detailed log here)

On my EC2 bastion:
$ kubectl get pods -n csgdev
NAME READY STATUS RESTARTS AGE
ai-engine-f599fbdd8-nv6nj 2/2 Running 0 40d
broker-64768b4984-8hqcp 1/1 Running 0 66d
sdk-cli-sdk-helper-csgdev-ltm 2/2 Running 0 51d
sdk-cli-sdk-helper-csgdev-rsmw 2/2 Running 0 51d
sdk-cli-sdk-helper-csgdev-zdt 2/2 Running 0 51d

I use this pod to run code:
sdk-cli-sdk-helper-csgdev-ltm

nvidia info:

nvidia-smi
Fri Jun 13 17:14:17 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.144.03             Driver Version: 550.144.03     CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla T4                       On  |   00000000:00:1C.0 Off |                    0 |
| N/A   34C    P8             11W /   70W |       3MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

is there a way to communicate with the inference server (ai-engine) prior to running the pipeline?

amargolin · July 9, 2025, 3:46pm

Hi,
Is this issue specific to Riva?

saaguinaga · July 9, 2025, 5:00pm

No I’m not sure why it got tagged with Riva

Topic		Replies	Views
NVIDIA Morpheus runtime error: Model is not ready Cybersecurity	15	173	July 9, 2025
Unexpected inference input using inference server 20.01 Triton Inference Server - archived	0	935	August 23, 2020
Kaldi triton inference server not working on nvidia k80 Triton Inference Server - archived	0	811	August 5, 2021
Model Inference Issue : some layers missing or unsupported data types in output tensors DeepStream SDK	5	1364	October 12, 2021
Triton inference server is sending back "HTTP/1.1 400 Bad Request" TAO Toolkit	6	3464	October 12, 2021
Utilizing Inference server for multi-batch processing with deepstream DeepStream SDK gstreamer , inference-server-triton , deepstream61	11	1187	October 19, 2023
Deepstream Triton Ensemble Model Error DeepStream SDK inference-server-triton	8	1044	June 15, 2022
SSD inference error TAO Toolkit	6	673	September 20, 2022
Error when using Triton Server for Inference on deepstream-imagedata-example DeepStream SDK	21	1856	October 12, 2021
Triton infererence server example 'simple_grpc_infer_client.py' DeepStream SDK	11	5128	March 23, 2022

Passing data to inference errors undecipherable

Related topics