[DeepStream 7.0] Program stuck when starting pipeline using gst-nvtracker

• Hardware Platform: Jetson Orin NX 8GB
• DeepStream Version: 7.0
• JetPack Version: 6.0
• TensorRT Version: 8.6.2.3
• Issue Type: Bug

I have developed a C++ program that executes a GStreamer pipeline that performs object detection from multiple camera sources using the gst-nvinfer plugin and tracks detected objects using the gst-nvtracker plugin.

From time to time, when the memory usage on the system is very high, I have been experiencing the program getting stuck when starting the pipeline.

When attaching the running process to GDB, I discovered the program getting stuck in the call to gst_element_set_state() to set the pipeline to PLAYING state.

Following the call stack upward revealed the actual blocking point is a std::condition_variable::wait() call inside the libnvdsgst_tracker.so plugin.

After rebuilding this plugin with debug symbols, I traced the issue to the following call chain in the gst-nvtracker sources (Deepstream version 7.0):

The program calls NvTrackerProc::deInit() at nvtracker_proc.cpp:313 because initConvBufPool() returns false:

/** Initialize the buffer pool based on config */
if (m_Config.inputTensorMeta == false){
  ret = initConvBufPool();
  if (!ret) {
    LOG_ERROR("gstnvtracker: Failed to initilaize surface transform buffer pool.\n");
    deInit();
    return false;
  }
}

The hang occurs in NvTrackerProc::deInit() at nvtracker_proc.cpp:424, where the caller waits infinitely on the m_BufQueueCond condition variable:

void NvTrackerProc::deInit()
{
  /** Clear out all pending process requests and return surface buffer
    * and notify all threads waiting for these requests */
  unique_lock<mutex> lkProc(m_ProcQueueLock);
  if (m_Config.numTransforms > 0 && m_Config.inputTensorMeta == false) {
    while (!m_ConvBufMgr.isQueueFull())
    {
      /** printf("m_ConvBufMgr.getFreeQueueSize() %d, m_ConvBufMgr.getActualPoolSize() %d\n",
        * m_ConvBufMgr.getFreeQueueSize(), m_ConvBufMgr.getActualPoolSize()); */
      m_BufQueueCond.wait(lkProc);
    }
  }
  /* OMITTED */
}

You can reproduce the mentioned issue by simply commenting line nvtracker_proc.cpp:310 and setting ret=false, to skip the call to initConvBufPool() and simulate a failure condition.

What factors could cause the initConvBufPool() function to fail?

What solution do you suggest for fixing this issue?

This may be stuck due to insufficient memory. You can check why ConvBufManager::init failed.
I guess it might be caused by NvBufSurfaceCreate failure

/** Create the buffers. The proper way is to set number of buffer sets as a fixed number. */
  for (uint32_t setInd = 0; setInd < MAX_BUFFER_POOL_SIZE; setInd++)
  {

    NvBufSurface *pNewBuf = nullptr;
    int ret = NvBufSurfaceCreate(&pNewBuf, batchSize, &bufferParam);
    if (ret < 0)
    {
      LOG_ERROR("gstnvtracker: Got %d creating nvbufsurface\n", ret);
      deInit();
      return false;
    }

The best course of action might be to replace it with a better device. Orin NX 8GB is an memory limited device, try increasing the swap space

Thank you for your reply.
As I pointed out in my post, in case the call to ConvBufManager::init() fails, the caller thread will get stuck on m_BufQueueCond.wait().
This is a bug. Will you provide a fix for this in the next release?

We are discussing this issue internally and will reply to this topic if it is fixed.

Thank you for the support.

Meanwhile, I was able to retrieve the error prints of the plugin when this issue occurred:

gstnvtracker: Got -1 mapping nvbufsurface
gstnvtracker: Failed to initialize ConvBufManager
gstnvtracker: Failed to initilaize surface transform buffer pool.

So I can confirm that the failure point is in the call to NvBufSurfaceMap() at convbufmanager.cpp:102, which returns -1.

In case this error condition is handled correctly, will it be safe for the application to try to restart the pipeline again?.

This is usually due to low memory. If you restart the pipeline it may continue to fail.