Unexpected deadlock

hassansadiq · January 3, 2023, 1:52pm

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) Tesla T4 GPU (aws)
• DeepStream Version 6.1.1-triton
• JetPack Version (valid for Jetson only)
• TensorRT Version 8.4.1
• NVIDIA GPU Driver Version (valid for GPU only) 515.65.01
• Issue Type( questions, new requirements, bugs) bugs
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

Hello,

We have developed deepstream pipeline with a combination of tritonserver which works well but after some random time it crashes with following error:

ERROR: infer_grpc_client.cpp:427 inference failed with error: in ensemble ‘classification_model’, [request id: 594] unexpected deadlock, at least one output is not set while no more ensemble steps can be made
0:00:35.993151751 106951 0x7f8e8808e750 WARN nvinferserver gstnvinferserver.cpp:531:gst_nvinfer_server_push_buffer: error: inference failed with unique-id:191
Error: gst-library-error-quark: inference failed with unique-id:191 (1): gstnvinferserver.cpp(531): gst_nvinfer_server_push_buffer (): /GstPipeline:pipeline0/GstNvInferServer:primary-nvinference-engine1
Exiting app

Sometimes pipeline crashes after 2-10 minutes and sometimes it run for hours but eventually crashes.

We developed Deepstream pipeline with Python binding and it consists of two models:

Primary detector
Secondary classifier that runs on detected objects using PROCESS_MODE_CLIP_OBJECTS parameter.

Pipeline has rtsp stream as output. Output stream can bee seen on VLC media player with detected objects drawn and classification labels as long as pipeline is running.
Both models are deployed on Triton server and use grpc configuration for inference. Both models are ensemble models with three steps:

Preprocessing using DALI
Model inference using ONNX model
Post processing using python backend
Triton-server does not give any error or warning on its side even after enabling verbose logging using --log-verbose.

Could you please help me to understand this strange behavior and how this issue can be resolved?
Thanks and Regards.

Fiona.Chen · January 4, 2023, 9:48am

Can you catch more log with “export GST_DEBUG=nvinferserver:7”?

hassansadiq · January 4, 2023, 12:06pm

I have run it with export GST_DEBUG=nvinferserver:7 but I am not able to catch more logs for nvinferserver.
I am running it inside nvcr.io/nvidia/deepstream:6.1.1-triton container.

Here are the logs:

root# export GST_DEBUG=nvinferserver:7
root# gst-launch-1.0 uridecodebin uri=rtsp... ! m.sink_0 nvstreammux name=m batch-size=1 width=1280 height=720 ! nvinferserver config-file-path = /chooch/infer_configs/0f58e413-d876-473d-9ed0-9cb246fd411a.txt ! nvvideoconvert ! fakesink 
Setting pipeline to PAUSED ...
INFO: infer_grpc_backend.cpp:169 TritonGrpcBackend id:6 initialized for model: 0f58e413-d876-473d-9ed0-9cb246fd411a
Pipeline is live and does not need PREROLL ...
Progress: (open) Opening Stream
Progress: (connect) Connecting to rtsp...
Progress: (open) Retrieving server options
Progress: (open) Retrieving media info
Progress: (request) SETUP stream 0
Progress: (request) SETUP stream 0
Progress: (request) SETUP stream 0
Progress: (request) SETUP stream 1
Progress: (open) Opened Stream
Setting pipeline to PLAYING ...
New clock: GstSystemClock
Progress: (request) Sending PLAY request
Progress: (request) Sending PLAY request
Progress: (request) Sent PLAY request
Missing element: MPEG-4 AAC decoder
WARNING: from element /GstPipeline:pipeline0/GstURIDecodeBin:uridecodebin0: No decoder available for type 'audio/mpeg, mpegversion=(int)4, stream-format=(string)raw, codec_data=(buffer)121056e500'.
Additional debug info:
gsturidecodebin.c(920): unknown_type_cb (): /GstPipeline:pipeline0/GstURIDecodeBin:uridecodebin0
ERROR: infer_grpc_client.cpp:427 inference failed with error: in ensemble '0f58e413-d876-473d-9ed0-9cb246fd411a', [request id: 7710] unexpected deadlock, at least one output is not set while no more ensemble steps can be made
0:11:53.181605541 316776 0x7f4554078330 WARN           nvinferserver gstnvinferserver.cpp:531:gst_nvinfer_server_push_buffer:<nvinferserver0> error: inference failed with unique-id:6
ERROR: from element /GstPipeline:pipeline0/GstNvInferServer:nvinferserver0: inference failed with unique-id:6
Additional debug info:
gstnvinferserver.cpp(531): gst_nvinfer_server_push_buffer (): /GstPipeline:pipeline0/GstNvInferServer:nvinferserver0
Execution ended after 0:11:50.503783295
Setting pipeline to NULL ...
Freeing pipeline ...

Fiona.Chen · January 4, 2023, 12:53pm

Can you share the config.pbtxt for ‘classification_model’? Can you share the nvinferserver config files?

hassansadiq · January 4, 2023, 2:07pm

I have shared config.pbtxt files we are using for tritonserver via private message.

Fiona.Chen · January 4, 2023, 3:01pm

Which is the model in the error log?

Fiona.Chen · January 6, 2023, 4:23am

For your case, the single pipeline works while multiple pipeline failed with above errors. We need more log to analysis the issue. You can use “export NVDSINFERSERVER_LOG_LEVEL=5” to enable nvinferserver plugin log

hassansadiq · January 6, 2023, 12:58pm

Please find attached log file with “export NVDSINFERSERVER_LOG_LEVEL=5"
dc9be40b-e29e-4961-a0d3-d4bd4934b571 represents primary detector model and 19ed5ed6-be74-429d-9636-812de1d0996d represents secondary classifier which works on crops.
logs.txt (97.4 KB)

Fiona.Chen · January 17, 2023, 8:36am

The issue is resolved. The ensemble model is complicated, so the Triton server configuration should be proper for the multiple clients case.

system · January 31, 2023, 8:37am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Deepstream Triton Ensemble Model Error DeepStream SDK inference-server-triton	8	1077	June 15, 2022
Error when using ensemble model with deepstream-5.1 : failed to get input buffer in CPU memory DeepStream SDK inference-server-triton	7	1231	September 4, 2021
Running ds-triton pipeline with DeepStream and Triton C Api, triton model inference is stuck and the frame rate of deepstream drops to 0 DeepStream SDK	10	648	January 3, 2023
Error running deepstream with triton server inference of a tensorflow frozengraph model DeepStream SDK tensorflow	3	430	October 12, 2021
Issues we face when using triton ensemble model through grpc call DeepStream SDK deepstream	2	614	March 30, 2022
Error when using Triton Server for Inference on deepstream-imagedata-example DeepStream SDK	21	1917	October 12, 2021
One program with multi pipeline，nvinferserver always infer failed DeepStream SDK	3	390	August 25, 2023
[error] when DeepsTream`s container using Triton Inference Server through gRPC,Segmentation fault (core dumped) DeepStream SDK	11	1125	March 9, 2022
Mixed predicts from Triton Inference Server when connected several deepstream pipelines via gRPC DeepStream SDK inference-server-triton , grpc , deepstream	7	688	June 28, 2022
Pipeline stuck when using nvinferserver DeepStream SDK	3	385	May 31, 2022

Unexpected deadlock

Related topics