Deepstream python works with grpc but gets stuck on using model_repo

Please provide complete information as applicable to your setup.

• Hardware Platform GPU
• DeepStream Version 6.2
• TensorRT Version 8.5.2-1+cuda11.8
• NVIDIA GPU Driver Version (valid for GPU only) 525.105.17
• Issue Type( questions, new requirements, bugs) questions/bugs

I have a deepstream python (with pgie and one sgie) app that works well when the config is

backend {
triton {
model_name: “yolov8_nms_tensorrt”
version: -1
grpc {
url: “127.0.0.1:8001”
enable_cuda_buffer_sharing: true
}
}
}

but freezes after a few frames when the config is

backend {
triton {
model_name: “yolov8_nms_tensorrt”
version: -1
model_repo{
root: “/opt/nvidia/deepstream/deepstream-6.2/people-app/model_repo”
log_level: 2
strict_model_config: 1
}
}
}

Am i doing something wrong here or missing something ?

to narrow down this issue, can you do the following check:

  1. can the application run fine only using pgie?
  2. about “but freezes after a few frames”, can you see the output result?
  3. could you share more logs? please do “export GST_DEBUG=6” first to modify Gstreamer’s log level, then run again, you can redirect the logs to a file.
1 Like

@fanzh

  1. no it can not
  2. i set log level 3 in

backend {
triton {
model_name: “yolov8_nms_tensorrt”
version: -1
model_repo{
root: “/opt/nvidia/deepstream/deepstream-6.2/people-app/model_repo”
log_level: 3
strict_model_config: 1
}
}
}

logs are as follows

I0530 06:21:06.526065 52 tensorrt.cc:5711] model yolov8_nms_tensorrt, instance yolov8_nms_tensorrt, executing 1 requests
I0530 06:21:06.526095 52 tensorrt.cc:1736] TRITONBACKEND_ModelExecute: Issuing yolov8_nms_tensorrt with 1 requests
I0530 06:21:06.526107 52 tensorrt.cc:1795] TRITONBACKEND_ModelExecute: Running yolov8_nms_tensorrt with 1 requests
I0530 06:21:06.526132 52 tensorrt.cc:2925] Optimization profile default [0] is selected for yolov8_nms_tensorrt
I0530 06:21:06.526210 52 tensorrt.cc:2299] Context with profile default [0] is being executed for yolov8_nms_tensorrt
I0530 06:21:06.527609 52 infer_response.cc:167] add response output: output: num_dets, type: INT32, shape: [1,1]
I0530 06:21:06.527662 52 infer_response.cc:167] add response output: output: bboxes, type: FP32, shape: [1,100,4]
I0530 06:21:06.527691 52 infer_response.cc:167] add response output: output: scores, type: FP32, shape: [1,100]
I0530 06:21:06.527716 52 infer_response.cc:167] add response output: output: labels, type: INT32, shape: [1,100]
I0530 06:21:06.529633 52 tensorrt.cc:2782] TRITONBACKEND_ModelExecute: model yolov8_nms_tensorrt released 1 requests
I0530 06:21:06.567000 52 infer_request.cc:713] [request id: 1444] prepared: [0x0x7efbdc00a960] request id: 1444, model: yolov8_nms_tensorrt, requested version: -1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 1, priority: 0, timeout (us): 0
original inputs:
[0x0x7efbdc009118] input: images, type: FP32, original shape: [1,3,640,640], batch + shape: [1,3,640,640], shape: [3,640,640]
override inputs:
inputs:
[0x0x7efbdc009118] input: images, type: FP32, original shape: [1,3,640,640], batch + shape: [1,3,640,640], shape: [3,640,640]
original requested outputs:
bboxes
labels
num_dets
scores
requested outputs:
bboxes
labels
num_dets
scores

I0530 06:21:06.567060 52 tensorrt.cc:5711] model yolov8_nms_tensorrt, instance yolov8_nms_tensorrt, executing 1 requests
I0530 06:21:06.567083 52 tensorrt.cc:1736] TRITONBACKEND_ModelExecute: Issuing yolov8_nms_tensorrt with 1 requests
I0530 06:21:06.567095 52 tensorrt.cc:1795] TRITONBACKEND_ModelExecute: Running yolov8_nms_tensorrt with 1 requests
I0530 06:21:06.567110 52 tensorrt.cc:2925] Optimization profile default [0] is selected for yolov8_nms_tensorrt
I0530 06:21:06.567170 52 tensorrt.cc:2299] Context with profile default [0] is being executed for yolov8_nms_tensorrt
I0530 06:21:06.570395 52 infer_response.cc:167] add response output: output: num_dets, type: INT32, shape: [1,1]
I0530 06:21:06.570442 52 infer_response.cc:167] add response output: output: bboxes, type: FP32, shape: [1,100,4]
I0530 06:21:06.570469 52 infer_response.cc:167] add response output: output: scores, type: FP32, shape: [1,100]
I0530 06:21:06.570488 52 infer_response.cc:167] add response output: output: labels, type: INT32, shape: [1,100]

**PERF: {‘stream0’: 5.19}

**PERF: {‘stream0’: 0.0}

**PERF: {‘stream0’: 0.0}

**PERF: {‘stream0’: 0.0}

**PERF: {‘stream0’: 0.0}

after this everything just freezes and i just repeatedly get **PERF: {‘stream0’: 0.0}

  1. will share in a while

@fanzh

  1. generates a huge file and so am unable to upload here but the last 20 lines are as follows - seems like there is issue with rtsp output when using model_repo ?

0:05:32.601294471 24 0x2f3eaa0 DEBUG rtspsrc gstrtspsrc.c:5617:gst_rtspsrc_loop_udp: timeout, sending keep-alive
0:05:32.601308870 24 0x2f3eaa0 DEBUG rtspsrc gstrtspsrc.c:5190:gst_rtspsrc_send_keep_alive: creating server keep-alive
0:05:32.601393168 24 0x2f3eaa0 DEBUG rtspsrc gstrtspsrc.c:5597:gst_rtspsrc_loop_udp: doing receive with timeout 54 seconds
0:05:32.626421092 24 0x2f3eaa0 DEBUG rtspsrc gstrtspsrc.c:5610:gst_rtspsrc_loop_udp: we received a server message
0:05:32.626442992 24 0x2f3eaa0 DEBUG rtspsrc gstrtspsrc.c:5653:gst_rtspsrc_loop_udp: ignoring response message
0:05:32.626449492 24 0x2f3eaa0 LOG rtspsrc gstrtspsrc.c:9408:gst_rtspsrc_print_rtsp_message: --------------------------------------------
0:05:32.626456691 24 0x2f3eaa0 LOG rtspsrc gstrtspsrc.c:9430:gst_rtspsrc_print_rtsp_message: RTSP response message 0x7faa0905cda0
0:05:32.626478791 24 0x2f3eaa0 LOG rtspsrc gstrtspsrc.c:9431:gst_rtspsrc_print_rtsp_message: status line:
0:05:32.626485490 24 0x2f3eaa0 LOG rtspsrc gstrtspsrc.c:9432:gst_rtspsrc_print_rtsp_message: code: ‘200’
0:05:32.626491090 24 0x2f3eaa0 LOG rtspsrc gstrtspsrc.c:9433:gst_rtspsrc_print_rtsp_message: reason: ‘OK’
0:05:32.626498490 24 0x2f3eaa0 LOG rtspsrc gstrtspsrc.c:9434:gst_rtspsrc_print_rtsp_message: version: '1.0
0:05:32.626503090 24 0x2f3eaa0 LOG rtspsrc gstrtspsrc.c:9436:gst_rtspsrc_print_rtsp_message: headers:
0:05:32.626511490 24 0x2f3eaa0 LOG rtspsrc gstrtspsrc.c:9391:dump_key_value: key: ‘CSeq’, value: ‘10’
0:05:32.626518489 24 0x2f3eaa0 LOG rtspsrc gstrtspsrc.c:9391:dump_key_value: key: ‘Date’, value: ‘Tue, May 30 2023 11:45:52 GMT’
0:05:32.626524489 24 0x2f3eaa0 LOG rtspsrc gstrtspsrc.c:9391:dump_key_value: key: ‘Session’, value: ‘1269BF1E’
0:05:32.626532289 24 0x2f3eaa0 LOG rtspsrc gstrtspsrc.c:9391:dump_key_value: key: ‘Content-Length’, value: ‘10’
0:05:32.626537989 24 0x2f3eaa0 LOG rtspsrc gstrtspsrc.c:9439:gst_rtspsrc_print_rtsp_message: body: length 11
0:05:32.626544189 24 0x2f3eaa0 LOG rtspsrc gstrtspsrc.c:9442:gst_rtspsrc_print_rtsp_message: 2014.02.04(11)
0:05:32.626555288 24 0x2f3eaa0 LOG rtspsrc gstrtspsrc.c:9500:gst_rtspsrc_print_rtsp_message: --------------------------------------------
0:05:32.626561588 24 0x2f3eaa0 DEBUG rtspsrc gstrtspsrc.c:5597:gst_rtspsrc_loop_udp: doing receive with timeout 54 seconds

  1. i replaced my pgie with peoplenet and still the same thing happens

I0530 08:32:17.149744 439 tensorrt.cc:5711] model peoplenet_tao, instance peoplenet_tao, executing 1 requests
I0530 08:32:17.149782 439 tensorrt.cc:1736] TRITONBACKEND_ModelExecute: Issuing peoplenet_tao with 1 requests
I0530 08:32:17.149796 439 tensorrt.cc:1795] TRITONBACKEND_ModelExecute: Running peoplenet_tao with 1 requests
I0530 08:32:17.149820 439 tensorrt.cc:2925] Optimization profile default [0] is selected for peoplenet_tao
I0530 08:32:17.149897 439 tensorrt.cc:2299] Context with profile default [0] is being executed for peoplenet_tao
I0530 08:32:17.150334 439 infer_response.cc:167] add response output: output: output_bbox/BiasAdd, type: FP32, shape: [4,12,34,60]
I0530 08:32:17.150383 439 infer_response.cc:167] add response output: output: output_cov/Sigmoid, type: FP32, shape: [4,3,34,60]
I0530 08:32:17.154146 439 tensorrt.cc:2782] TRITONBACKEND_ModelExecute: model peoplenet_tao released 1 requests
I0530 08:32:17.276134 439 infer_request.cc:713] [request id: 305] prepared: [0x0x7fca74008ed0] request id: 305, model: peoplenet_tao, requested version: -1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 4, priority: 0, timeout (us): 0
original inputs:
[0x0x7fca740089f8] input: input_1, type: FP32, original shape: [4,3,544,960], batch + shape: [4,3,544,960], shape: [3,544,960]
override inputs:
inputs:
[0x0x7fca740089f8] input: input_1, type: FP32, original shape: [4,3,544,960], batch + shape: [4,3,544,960], shape: [3,544,960]
original requested outputs:
output_bbox/BiasAdd
output_cov/Sigmoid
requested outputs:
output_bbox/BiasAdd
output_cov/Sigmoid

I0530 08:32:17.276238 439 tensorrt.cc:5711] model peoplenet_tao, instance peoplenet_tao, executing 1 requests
I0530 08:32:17.276262 439 tensorrt.cc:1736] TRITONBACKEND_ModelExecute: Issuing peoplenet_tao with 1 requests
I0530 08:32:17.276273 439 tensorrt.cc:1795] TRITONBACKEND_ModelExecute: Running peoplenet_tao with 1 requests
I0530 08:32:17.276287 439 tensorrt.cc:2925] Optimization profile default [0] is selected for peoplenet_tao
I0530 08:32:17.276348 439 tensorrt.cc:2299] Context with profile default [0] is being executed for peoplenet_tao
I0530 08:32:17.280725 439 infer_response.cc:167] add response output: output: output_bbox/BiasAdd, type: FP32, shape: [4,12,34,60]

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

  1. can you reproducing the hung issue based on deepstream sample source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt without modification?
  2. from the log, there is no error output, please share the whole logs, you can upload a zip file.
  3. can you use gdb to get a full call stack?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.