with Morpheus and Nv Triton I am running a model from the sdk-cli-helper pod (in k8) and by the time data is passed to the triton server … I get an error that is hard to decipher.
See below
Undecipherable error messages:
File Source Rate for Athena Source stage: 0 m2025-06-13 17:06:10,763 [INFO] - ====Building Segment Complete!====essages [00:00, ? messages/s]
py inferencerate: 0 messages [00:00, ? messages/s]
2025-06-13 17:06:10,780 [INFO] - LM, ? inferences/s]
WARNING: Logging before InitGoogleLogging() is written to STDERR
E20250613 17:06:10.781409 11627 operators.cpp:78] Python occurred during full node subscription. Error: SystemExit: None0:00, ? messages/s]
RAAD rate: 0 messages [00:00, ? messages/s]
At:ter rate: 0 messages [00:00, ? messages/s]
/opt/conda/envs/morpheus/lib/python3.10/_sitebuiltins.py(26): __call__
/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/stages/inference/triton_inference_stage.py(483): init
/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/stages/inference/inference_stage.py(234): py_inference_fn
E20250613 17:06:10.781551 11627 context.cpp:124] /linear_segment_0/inference-6; rank: 0; size: 1; tid: 139822190556736: set_exception issued; issuing kill to current runnable. Exception msg: SystemExit: None
At:
/opt/conda/envs/morpheus/lib/python3.10/_sitebuiltins.py(26): __call__
/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/stages/inference/triton_inference_stage.py(483): init
/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/stages/inference/inference_stage.py(234): py_inference_fn
2025-06-13 17:06:10,780 [INFO] - left shift data extraction
2025-06-13 17:06:10,812 [INFO] - Current time :'2025-06-13 16:06:10.811'
2025-06-13 17:06:10,812 [INFO] - 10 minutes lookback window :'2025-06-13 15:56:10.811' to '2025-06-13 16:06:10.811'0 messages [00:00, ? messages/s]
2025-06-13 17:06:10,812 [INFO] - 2 days lookback window :'2025-06-11 16:06:10.811' to '2025-06-13 16:06:10.811'
2025-06-13 17:06:10,812 [INFO] - Failed time window :'2025-06-06 15:56:10.811' to '2025-06-13 16:06:10.811'
2025-06-13 17:06:10,813 [INFO] - Fetching feature engineered data for ingestion
**********
fetch lookup data)
2025-06-13 17:06:10,813 [INFO] - Querying database to fetch the relevant data
File Source Rate for Athena Source stage: 0 messages [02025-06-13 17:06:15,134 [INFO] - Created CTAS table "cs-analytics-pipeline"."temp_table_212b023fc19b4648afa1bba9bb231c97"
File Source Rate for Athena Source stage: 0 messages [02025-06-13 17:06:16,350 [INFO] - engineered features: (12, 44)es [00:05, ? messages/s]nferences/s]
2025-06-13 17:06:16,381 [INFO] - Total events in the given time range to analyze : 12
2025-06-13 17:06:16,386 [INFO] - Data extracted with shape: (12, 39)
msg meta with features data frames [00:00, ? messages/s]s]
W20250613 17:06:16.396265 11625 meta.cpp:223] Dataframe is not a cudf dataframe, converting to cudf dataframeplete]: 0 messages [00:00, ? messages/s]
E20250613 17:06:16.627050 11625 segment.cpp:240] /linear_segment_0/data-source-stage-0; rank: 0; size: 1; tid: 139822484141632Error occurred in source. Error msg: AssertionError: message meta test failed
At:
/common/models/lateral-movement-detection-global/morpheus_pipeline/morpheus_data_source_stage.py(137): _generate_frames
File Source Rate for Athena Source stage[Complete]: 0 messagE20250613 17:06:16.719980 11598 runner.cpp:189] Runner::await_join - an exception was caught while awaiting on one or more contexts/instances - rethrowing
E20250613 17:06:16.763722 11598 segment_instance.cpp:273] segment::SegmentInstance - an exception was caught while awaiting on one or more nodes - rethrowingges [00:05, ? messages/s]
E20250613 17:06:16.764112 11598 service.cpp:224] Service[segment::SegmentInstance]: caught exception in service_await_join: SystemExit: None
At:
/opt/conda/envs/morpheus/lib/python3.10/_sitebuiltins.py(26): __call__
/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/stages/inference/triton_inference_stage.py(483): init
/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/stages/inference/inference_stage.py(234): py_inference_fn
E20250613 17:06:16.764423 11598 pipeline_instance.cpp:230] pipeline::PipelineInstance - an exception was caught while awaiting on segments - rethrowing
E20250613 17:06:16.764710 11598 service.cpp:224] Service[pipeline::PipelineInstance]: caught exception in service_await_join: SystemExit: None
At:
/opt/conda/envs/morpheus/lib/python3.10/_sitebuiltins.py(26): __call__
/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/stages/inference/triton_inference_stage.py(483): init
/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/stages/inference/inference_stage.py(234): py_inference_fn
E20250613 17:06:16.767711 11598 service.cpp:224] Service[pipeline::Manager]: caught exception in service_await_join: SystemExit: None
At:
/opt/conda/envs/morpheus/lib/python3.10/_sitebuiltins.py(26): __call__
/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/stages/inference/triton_inference_stage.py(483): init
/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/stages/inference/inference_stage.py(234): py_inference_fn
E20250613 17:06:16.768051 11598 service.cpp:224] Service[ExecutorDefinition]: caught exception in service_await_join: SystemExit: None
At:
/opt/conda/envs/morpheus/lib/python3.10/_sitebuiltins.py(26): __call__
/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/stages/inference/triton_inference_stage.py(483): init
/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/stages/inference/inference_stage.py(234): py_inference_fn
File Source Rate for Athena Source stage[Complete]: 0 messages [00:00, ? messages/s]
Deserialize rate[Complete]: 0 messages [00:00, ? messages/s]
Pre-process rate[Complete]: 0 messages [00:00, ? messages/s]
Inference rate[Complete]: 0 inferences [00:00, ? inferences/s]
AddScores LM[Complete]: 0 messages [00:00, ? messages/s]
Add class rate[Complete]: 0 messages [00:00, ? messages/s]
Add class rate[Complete]: 0 messages [00:00, ? messages/s]
RAAD rate[Complete]: 0 messages [00:00, ? messages/s]
Writer rate[Complete]: 0 messages [00:00, ? messages/s]
2025-06-13 17:06:16,783 [INFO] - ====Pipeline Complete====
2025-06-13 17:06:16,786 [INFO] - Remove client Client-b4053af2-4878-11f0-ad4e-c6bbb945e3f9
2025-06-13 17:06:16,789 [INFO] - Received 'close-stream' from tcp://127.0.0.1:47524; closing.
2025-06-13 17:06:16,790 [INFO] - Remove client Client-b4053af2-4878-11f0-ad4e-c6bbb945e3f9
2025-06-13 17:06:16,791 [INFO] - Close client connection: Client-b4053af2-4878-11f0-ad4e-c6bbb945e3f9
2025-06-13 17:06:16,797 [INFO] - Closing Nanny at 'tcp://127.0.0.1:39401'. Reason: nanny-close
2025-06-13 17:06:16,797 [INFO] - Nanny asking worker to close. Reason: nanny-close
2025-06-13 17:06:16,803 [INFO] - Stopping worker at tcp://127.0.0.1:43765. Reason: nanny-close
2025-06-13 17:06:16,807 [INFO] - Received 'close-stream' from tcp://127.0.0.1:47510; closing.
2025-06-13 17:06:16,808 [INFO] - Connection to tcp://127.0.0.1:45811 has been closed.
2025-06-13 17:06:16,808 [INFO] - Remove worker <WorkerState 'tcp://127.0.0.1:43765', name: 0, status: closing, memory: 0, processing: 0> (stimulus_id='handle-worker-cleanup-1749834376.8087003')
2025-06-13 17:06:16,809 [INFO] - Lost all workers
2025-06-13 17:06:17,763 [INFO] - Scheduler closing due to unknown reason...
2025-06-13 17:06:17,765 [INFO] - Scheduler closing all comms
Please provide the following information when requesting support.
Hardware - GPU (A100/A30/T4/V100)
Hardware - CPU
Operating System - AWS Linux Linux sdk-cli-sdk-helper-csgdev-ltm 5.10.234-225.910.amzn2.x86_64 #1 SMP Fri Feb 14 16:52:40 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Riva Version
TLT Version (if relevant)
How to reproduce the issue ? (This is for errors. Please share the command and the detailed log here)
On my EC2 bastion:
$ kubectl get pods -n csgdev
NAME READY STATUS RESTARTS AGE
ai-engine-f599fbdd8-nv6nj 2/2 Running 0 40d
broker-64768b4984-8hqcp 1/1 Running 0 66d
sdk-cli-sdk-helper-csgdev-ltm 2/2 Running 0 51d
sdk-cli-sdk-helper-csgdev-rsmw 2/2 Running 0 51d
sdk-cli-sdk-helper-csgdev-zdt 2/2 Running 0 51d
I use this pod to run code:
sdk-cli-sdk-helper-csgdev-ltm
nvidia info:
nvidia-smi
Fri Jun 13 17:14:17 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.144.03 Driver Version: 550.144.03 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla T4 On | 00000000:00:1C.0 Off | 0 |
| N/A 34C P8 11W / 70W | 3MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
is there a way to communicate with the inference server (ai-engine) prior to running the pipeline?