Deepstream run resnet50 blocked

Abama · June 6, 2021, 4:36am

Please provide complete information as applicable to your setup.

**• Hardware Platform: GPU-T4
**• DeepStream Version-5.1
**• TensorRT Version： 7.2.2-1+cuda11.1
**• NVIDIA GPU Driver Version ： 450.80.02
**• Issue Type( questions, new requirements, bugs) ： bugs
**• deepstream run resnet50 , after several seconds, the process blocked, print the thread stack, the infer thread blocked in method queueInputBatch

infer config :


[property]
gpu-id=0
gie-unique-id=4
net-scale-factor=1
labelfile-path=/opt/nvidia/deepstream/deepstream-5.1/samples/configs/jingan-app/vehicletypes-labels.txt
onnx-file=/opt/nvidia/deepstream/deepstream-5.1/sources/apps/sample_apps/deepstream-infer-tensor-meta-test/resnet50-v2-7.onnx
#force-implicit-batch-dim=0
#batch-size=1
#infer-dims=3;256;128
model-color-format=1
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=2
process-mode=2
output-blob-names=resnetv24_dense0_fwd
classifier-async-mode=0
classifier-threshold=0.51
input-object-min-width=128
input-object-min-height=128
operate-on-gie-id=1
#operate-on-class-ids=0
#scaling-filter=0
#scaling-compute-hw=0
output-tensor-meta=1

network-type=100

the block thread stack

Thread 8 (Thread 0x7f1da4ffd700 (LWP 8664)):
#0  0x00007f1e1d402ad3 in pthread_cond_wait@@GLIBC_2.3.2 () at /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00007f1e1e6828bc in std::condition_variable::wait(std::unique_lock<std::mutex>&) () at /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#2  0x00007f1dbc77cee8 in void std::condition_variable::wait<nvdsinfer::GuardQueue<std::__cxx11::list<nvdsinfer::NvDsInferBatch*, std::allocator<nvdsinfer::NvDsInferBatch*> > >::pop()::{lambda()#1}>(std::unique_lock<std::mutex>&, nvdsinfer::GuardQueue<std::__cxx11::list<nvdsinfer::NvDsInferBatch*, std::allocator<nvdsinfer::NvDsInferBatch*> > >::pop()::{lambda()#1}) ()
    at /opt/nvidia/deepstream/deepstream-5.1/lib/libnvds_infer.so
#3  0x00007f1dbc7768e6 in nvdsinfer::GuardQueue<std::__cxx11::list<nvdsinfer::NvDsInferBatch*, std::allocator<nvdsinfer::NvDsInferBatch*> > >::pop() ()
    at /opt/nvidia/deepstream/deepstream-5.1/lib/libnvds_infer.so
#4  0x00007f1dbc76a78a in nvdsinfer::NvDsInferContextImpl::queueInputBatch(NvDsInferContextBatchInput&) () at /opt/nvidia/deepstream/deepstream-5.1/lib/libnvds_infer.so
#5  0x00007f1dd01aeac1 in gst_nvinfer_input_queue_loop(void*) () at /usr/lib/x86_64-linux-gnu/gstreamer-1.0/deepstream/libnvdsgst_infer.so
#6  0x00007f1e1f6d7175 in  () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#7  0x00007f1e1d3fc6db in start_thread () at /lib/x86_64-linux-gnu/libpthread.so.0
#8  0x00007f1e1e0e371f in clone () at /lib/x86_64-linux-gnu/libc.so.6

bcao · June 7, 2021, 1:56am

What’s your pipeline? also can you export NVDSINFER_LOG_LEVEL=10 to enable more log?

bcao · June 7, 2021, 1:58am

In addition, how do you print the block thread stack?

Abama · June 7, 2021, 2:03am

I re make the gst-nvinfer source code, add print thread id like this

  pid_t t_id = syscall(__NR_gettid);
    gst_println("infer(%d) queue length: %d,t_id=%d",nvinfer->unique_id,size,t_id);

Through the above code, I get the infer thread id . Then I use gdb print all threads thread, at last find the infer thread stack like the questions detail

Abama · June 7, 2021, 2:04am

Yes, but I found nothing error or warnning messase

bcao · June 7, 2021, 2:22am

Cool, but how did you find the block is caused by the infer thread, I still didn’t get, could you explain more details?

Abama · June 7, 2021, 2:52am

I print the nvinfer->input_queue and nvinfer->process_queue size , process_queue = 0 and input_queue size > 0 , but infer thread blocked , so I guess the process blocked reason is infer thread .

further analysis, the infer thread blocked by INvDsInferContext-> m_FreeBatchQueue → pop() method, if the queue is empty, pop will blocked until element is ready .

m_FreeBatchQueue is store NvDsInferBatch, include output tensor, out application configure output-tensor-meta=1, so the guess the tensor object not released , so the

NvDsInferBatch not recycyed to m_FreeBatchQueue .

I add log to release_tensor_output_meta method, the method not called

static void
release_tensor_output_meta (gpointer data, gpointer user_data)
{
    gst_println("release_tensor_output_meta run");
  NvDsUserMeta *user_meta = (NvDsUserMeta *) data;
  NvDsInferTensorMeta *meta = (NvDsInferTensorMeta *) user_meta->user_meta_data;
  gst_mini_object_unref (GST_MINI_OBJECT (meta->priv_data));
  delete[] meta->out_buf_ptrs_dev;
  delete[] meta->out_buf_ptrs_host;
  delete meta;
}

if I configure output-tensor-meta=0, the blocked problem no longer come .

SO, The underlying cause of the problem maybe NvDsMeta not destroyed

Abama · June 7, 2021, 3:01am

In source code attach_tensor_output_meta method(), I see

NvDsUserMeta *user_meta = nvds_acquire_user_meta_from_pool (batch_meta);
    user_meta->user_meta_data = meta;
    user_meta->base_meta.meta_type =
        (NvDsMetaType) NVDSINFER_TENSOR_OUTPUT_META;
    user_meta->base_meta.release_func = release_tensor_output_meta;
    user_meta->base_meta.copy_func = nullptr;
    user_meta->base_meta.batch_meta = batch_meta;

In this code, user_meta->base_meta.release_func = release_tensor_output_meta,
but I not find the source code of release_func call code, Is this part not open source ?

the next question is what’s the user_meta recycel strategy ?

Abama · June 7, 2021, 2:41pm

I basically determine the cause of the problem, when configure

output-tensor-meta=1
batch-size=1

the problem appear， when I export model batch size = 16 and update then configure,
then problem disappear, You can test in demo deepstream-infer-tensor-meta-test ,
But I dit not known the reason, Please pay attention to this question. Thank you !

bcao · June 8, 2021, 7:44am

Yes, this issue has been fixed in next DS release, we provide a config item to enlarge the buffer pool.
Currently, you can refer Deepstream-infer-tensor-meta-test 5.0 queue_dataflow gstqueue.c:1243:gst_queue_loop:<queue0> queue is empty for a WAR

Topic		Replies	Views
Failed to run the example(deepstream_image_meta_test) DeepStream SDK	12	442	January 26, 2022
Run BACK-TO-BACK-DETECTORS REFERENCE APP under DeepStream SDK 5.0 DeepStream SDK	16	997	October 12, 2021
Resnet101 nvinferserver intilization error DeepStream SDK	33	1453	October 12, 2021
Internal data stream error while running the deepstream-testsr-app DeepStream SDK	10	1000	March 7, 2023
Issues with running inference on multiple rtsp streams in deepstream-imagedata-multistream DeepStream SDK jetson-inference	24	664	August 7, 2024
DeepStream 7.1 nvinferserver tensor clone error DeepStream SDK deepstream	12	80	November 29, 2024
Deepstream engine files not generated DeepStream SDK	13	4691	October 12, 2021
DeepStream 6.4 Aborted (core dumped) DeepStream SDK deepstream	12	66	January 14, 2025
Cannot run deepstream-test-1 in deepstream_python_apps: Where is the ../../../../samples/ folder? DeepStream SDK	5	2274	October 12, 2021
Deep Stream SDK DeepStream SDK gstreamer	4	514	October 12, 2021

Deepstream run resnet50 blocked

Related topics