• Hardware Platform: Jetson Orin NX 8GB
• DeepStream Version: 7.0
• JetPack Version: 6.0
• TensorRT Version: 8.6.2.3
• Issue Type: Bug
I have developed a C++ program that executes a GStreamer pipeline that performs object detection from multiple camera sources using the gst-nvinfer plugin and tracks detected objects using the gst-nvtracker plugin.
From time to time, when the memory usage on the system is very high, I have been experiencing the program getting stuck when starting the pipeline.
When attaching the running process to GDB, I discovered the program getting stuck in the call to gst_element_set_state() to set the pipeline to PLAYING state.
Following the call stack upward revealed the actual blocking point is a std::condition_variable::wait() call inside the libnvdsgst_tracker.so plugin.
After rebuilding this plugin with debug symbols, I traced the issue to the following call chain in the gst-nvtracker sources (Deepstream version 7.0):
The program calls NvTrackerProc::deInit() at nvtracker_proc.cpp:313 because initConvBufPool() returns false:
/** Initialize the buffer pool based on config */
if (m_Config.inputTensorMeta == false){
ret = initConvBufPool();
if (!ret) {
LOG_ERROR("gstnvtracker: Failed to initilaize surface transform buffer pool.\n");
deInit();
return false;
}
}
The hang occurs in NvTrackerProc::deInit() at nvtracker_proc.cpp:424, where the caller waits infinitely on the m_BufQueueCond condition variable:
void NvTrackerProc::deInit()
{
/** Clear out all pending process requests and return surface buffer
* and notify all threads waiting for these requests */
unique_lock<mutex> lkProc(m_ProcQueueLock);
if (m_Config.numTransforms > 0 && m_Config.inputTensorMeta == false) {
while (!m_ConvBufMgr.isQueueFull())
{
/** printf("m_ConvBufMgr.getFreeQueueSize() %d, m_ConvBufMgr.getActualPoolSize() %d\n",
* m_ConvBufMgr.getFreeQueueSize(), m_ConvBufMgr.getActualPoolSize()); */
m_BufQueueCond.wait(lkProc);
}
}
/* OMITTED */
}
You can reproduce the mentioned issue by simply commenting line nvtracker_proc.cpp:310 and setting ret=false, to skip the call to initConvBufPool() and simulate a failure condition.
What factors could cause the initConvBufPool() function to fail?
What solution do you suggest for fixing this issue?