Deepstream run resnet50 blocked

Please provide complete information as applicable to your setup.

**• Hardware Platform: GPU-T4
**• DeepStream Version-5.1
**• TensorRT Version: 7.2.2-1+cuda11.1
**• NVIDIA GPU Driver Version : 450.80.02
**• Issue Type( questions, new requirements, bugs) : bugs
**• deepstream run resnet50 , after several seconds, the process blocked, print the thread stack, the infer thread blocked in method queueInputBatch

infer config :


[property]
gpu-id=0
gie-unique-id=4
net-scale-factor=1
labelfile-path=/opt/nvidia/deepstream/deepstream-5.1/samples/configs/jingan-app/vehicletypes-labels.txt
onnx-file=/opt/nvidia/deepstream/deepstream-5.1/sources/apps/sample_apps/deepstream-infer-tensor-meta-test/resnet50-v2-7.onnx
#force-implicit-batch-dim=0
#batch-size=1
#infer-dims=3;256;128
model-color-format=1
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=2
process-mode=2
output-blob-names=resnetv24_dense0_fwd
classifier-async-mode=0
classifier-threshold=0.51
input-object-min-width=128
input-object-min-height=128
operate-on-gie-id=1
#operate-on-class-ids=0
#scaling-filter=0
#scaling-compute-hw=0
output-tensor-meta=1

network-type=100
                      

the block thread stack

Thread 8 (Thread 0x7f1da4ffd700 (LWP 8664)):
#0  0x00007f1e1d402ad3 in pthread_cond_wait@@GLIBC_2.3.2 () at /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00007f1e1e6828bc in std::condition_variable::wait(std::unique_lock<std::mutex>&) () at /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#2  0x00007f1dbc77cee8 in void std::condition_variable::wait<nvdsinfer::GuardQueue<std::__cxx11::list<nvdsinfer::NvDsInferBatch*, std::allocator<nvdsinfer::NvDsInferBatch*> > >::pop()::{lambda()#1}>(std::unique_lock<std::mutex>&, nvdsinfer::GuardQueue<std::__cxx11::list<nvdsinfer::NvDsInferBatch*, std::allocator<nvdsinfer::NvDsInferBatch*> > >::pop()::{lambda()#1}) ()
    at /opt/nvidia/deepstream/deepstream-5.1/lib/libnvds_infer.so
#3  0x00007f1dbc7768e6 in nvdsinfer::GuardQueue<std::__cxx11::list<nvdsinfer::NvDsInferBatch*, std::allocator<nvdsinfer::NvDsInferBatch*> > >::pop() ()
    at /opt/nvidia/deepstream/deepstream-5.1/lib/libnvds_infer.so
#4  0x00007f1dbc76a78a in nvdsinfer::NvDsInferContextImpl::queueInputBatch(NvDsInferContextBatchInput&) () at /opt/nvidia/deepstream/deepstream-5.1/lib/libnvds_infer.so
#5  0x00007f1dd01aeac1 in gst_nvinfer_input_queue_loop(void*) () at /usr/lib/x86_64-linux-gnu/gstreamer-1.0/deepstream/libnvdsgst_infer.so
#6  0x00007f1e1f6d7175 in  () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#7  0x00007f1e1d3fc6db in start_thread () at /lib/x86_64-linux-gnu/libpthread.so.0
#8  0x00007f1e1e0e371f in clone () at /lib/x86_64-linux-gnu/libc.so.6

What’s your pipeline? also can you export NVDSINFER_LOG_LEVEL=10 to enable more log?

In addition, how do you print the block thread stack?

I re make the gst-nvinfer source code, add print thread id like this

  pid_t t_id = syscall(__NR_gettid);
    gst_println("infer(%d) queue length: %d,t_id=%d",nvinfer->unique_id,size,t_id);

Through the above code, I get the infer thread id . Then I use gdb print all threads thread, at last find the infer thread stack like the questions detail

Yes, but I found nothing error or warnning messase

Cool, but how did you find the block is caused by the infer thread, I still didn’t get, could you explain more details?

I print the nvinfer->input_queue and nvinfer->process_queue size , process_queue = 0 and input_queue size > 0 , but infer thread blocked , so I guess the process blocked reason is infer thread .

further analysis, the infer thread blocked by INvDsInferContext-> m_FreeBatchQueue → pop() method, if the queue is empty, pop will blocked until element is ready .

m_FreeBatchQueue is store NvDsInferBatch, include output tensor, out application configure output-tensor-meta=1, so the guess the tensor object not released , so the

NvDsInferBatch not recycyed to m_FreeBatchQueue .

I add log to release_tensor_output_meta method, the method not called

static void
release_tensor_output_meta (gpointer data, gpointer user_data)
{
    gst_println("release_tensor_output_meta run");
  NvDsUserMeta *user_meta = (NvDsUserMeta *) data;
  NvDsInferTensorMeta *meta = (NvDsInferTensorMeta *) user_meta->user_meta_data;
  gst_mini_object_unref (GST_MINI_OBJECT (meta->priv_data));
  delete[] meta->out_buf_ptrs_dev;
  delete[] meta->out_buf_ptrs_host;
  delete meta;
}

if I configure output-tensor-meta=0, the blocked problem no longer come .

SO, The underlying cause of the problem maybe NvDsMeta not destroyed

In source code attach_tensor_output_meta method(), I see

NvDsUserMeta *user_meta = nvds_acquire_user_meta_from_pool (batch_meta);
    user_meta->user_meta_data = meta;
    user_meta->base_meta.meta_type =
        (NvDsMetaType) NVDSINFER_TENSOR_OUTPUT_META;
    user_meta->base_meta.release_func = release_tensor_output_meta;
    user_meta->base_meta.copy_func = nullptr;
    user_meta->base_meta.batch_meta = batch_meta;

In this code, user_meta->base_meta.release_func = release_tensor_output_meta,
but I not find the source code of release_func call code, Is this part not open source ?

the next question is what’s the user_meta recycel strategy ?

I basically determine the cause of the problem, when configure

output-tensor-meta=1
batch-size=1 

the problem appear, when I export model batch size = 16 and update then configure,
then problem disappear, You can test in demo deepstream-infer-tensor-meta-test ,
But I dit not known the reason, Please pay attention to this question. Thank you !

Yes, this issue has been fixed in next DS release, we provide a config item to enlarge the buffer pool.
Currently, you can refer Deepstream-infer-tensor-meta-test 5.0 queue_dataflow gstqueue.c:1243:gst_queue_loop:<queue0> queue is empty for a WAR

1 Like