Unexpected deadlock

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) Tesla T4 GPU (aws)
• DeepStream Version 6.1.1-triton
• JetPack Version (valid for Jetson only)
• TensorRT Version 8.4.1
• NVIDIA GPU Driver Version (valid for GPU only) 515.65.01
• Issue Type( questions, new requirements, bugs) bugs
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

Hello,

We have developed deepstream pipeline with a combination of tritonserver which works well but after some random time it crashes with following error:

ERROR: infer_grpc_client.cpp:427 inference failed with error: in ensemble ‘classification_model’, [request id: 594] unexpected deadlock, at least one output is not set while no more ensemble steps can be made
0:00:35.993151751 106951 0x7f8e8808e750 WARN nvinferserver gstnvinferserver.cpp:531:gst_nvinfer_server_push_buffer: error: inference failed with unique-id:191
Error: gst-library-error-quark: inference failed with unique-id:191 (1): gstnvinferserver.cpp(531): gst_nvinfer_server_push_buffer (): /GstPipeline:pipeline0/GstNvInferServer:primary-nvinference-engine1
Exiting app

Sometimes pipeline crashes after 2-10 minutes and sometimes it run for hours but eventually crashes.

We developed Deepstream pipeline with Python binding and it consists of two models:

  1. Primary detector
  2. Secondary classifier that runs on detected objects using PROCESS_MODE_CLIP_OBJECTS parameter.

Pipeline has rtsp stream as output. Output stream can bee seen on VLC media player with detected objects drawn and classification labels as long as pipeline is running.
Both models are deployed on Triton server and use grpc configuration for inference. Both models are ensemble models with three steps:

  1. Preprocessing using DALI
  2. Model inference using ONNX model
  3. Post processing using python backend
    Triton-server does not give any error or warning on its side even after enabling verbose logging using --log-verbose.

Could you please help me to understand this strange behavior and how this issue can be resolved?
Thanks and Regards.

Can you catch more log with “export GST_DEBUG=nvinferserver:7”?

I have run it with export GST_DEBUG=nvinferserver:7 but I am not able to catch more logs for nvinferserver.
I am running it inside nvcr.io/nvidia/deepstream:6.1.1-triton container.

Here are the logs:

root# export GST_DEBUG=nvinferserver:7
root# gst-launch-1.0 uridecodebin uri=rtsp... ! m.sink_0 nvstreammux name=m batch-size=1 width=1280 height=720 ! nvinferserver config-file-path = /chooch/infer_configs/0f58e413-d876-473d-9ed0-9cb246fd411a.txt ! nvvideoconvert ! fakesink 
Setting pipeline to PAUSED ...
INFO: infer_grpc_backend.cpp:169 TritonGrpcBackend id:6 initialized for model: 0f58e413-d876-473d-9ed0-9cb246fd411a
Pipeline is live and does not need PREROLL ...
Progress: (open) Opening Stream
Progress: (connect) Connecting to rtsp...
Progress: (open) Retrieving server options
Progress: (open) Retrieving media info
Progress: (request) SETUP stream 0
Progress: (request) SETUP stream 0
Progress: (request) SETUP stream 0
Progress: (request) SETUP stream 1
Progress: (open) Opened Stream
Setting pipeline to PLAYING ...
New clock: GstSystemClock
Progress: (request) Sending PLAY request
Progress: (request) Sending PLAY request
Progress: (request) Sent PLAY request
Missing element: MPEG-4 AAC decoder
WARNING: from element /GstPipeline:pipeline0/GstURIDecodeBin:uridecodebin0: No decoder available for type 'audio/mpeg, mpegversion=(int)4, stream-format=(string)raw, codec_data=(buffer)121056e500'.
Additional debug info:
gsturidecodebin.c(920): unknown_type_cb (): /GstPipeline:pipeline0/GstURIDecodeBin:uridecodebin0
ERROR: infer_grpc_client.cpp:427 inference failed with error: in ensemble '0f58e413-d876-473d-9ed0-9cb246fd411a', [request id: 7710] unexpected deadlock, at least one output is not set while no more ensemble steps can be made
0:11:53.181605541 316776 0x7f4554078330 WARN           nvinferserver gstnvinferserver.cpp:531:gst_nvinfer_server_push_buffer:<nvinferserver0> error: inference failed with unique-id:6
ERROR: from element /GstPipeline:pipeline0/GstNvInferServer:nvinferserver0: inference failed with unique-id:6
Additional debug info:
gstnvinferserver.cpp(531): gst_nvinfer_server_push_buffer (): /GstPipeline:pipeline0/GstNvInferServer:nvinferserver0
Execution ended after 0:11:50.503783295
Setting pipeline to NULL ...
Freeing pipeline ...

Can you share the config.pbtxt for ‘classification_model’? Can you share the nvinferserver config files?

I have shared config.pbtxt files we are using for tritonserver via private message.

Which is the model in the error log?

For your case, the single pipeline works while multiple pipeline failed with above errors. We need more log to analysis the issue. You can use “export NVDSINFERSERVER_LOG_LEVEL=5” to enable nvinferserver plugin log

Please find attached log file with “export NVDSINFERSERVER_LOG_LEVEL=5"
dc9be40b-e29e-4961-a0d3-d4bd4934b571 represents primary detector model and 19ed5ed6-be74-429d-9636-812de1d0996d represents secondary classifier which works on crops.
logs.txt (97.4 KB)

The issue is resolved. The ensemble model is complicated, so the Triton server configuration should be proper for the multiple clients case.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.