How to store raw frame in cuda into thread queue

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) GPU
• DeepStream Version 7.1
• JetPack Version (valid for Jetson only)
• TensorRT Version 10.6
• NVIDIA GPU Driver Version (valid for GPU only) 560.30.35
• Issue Type( questions, new requirements, bugs) Questions

Hi!
I am having trouble with cuda memoery to raw frame data.

My encoder does not need to be executed for every bounding box at every moment. Instead, it operates intermittently when certain conditions are met, and its execution takes a considerable amount of time. Therefore, I want it to function asynchronously.

To achieve this, I plan to place a separate nvvideoconvert element after the nvinfer in the DeepStream pipeline. Using a frame callback function, I will extract the bounding boxes and raw frame data pointers, store them in a queue, and then process them in a separate thread after popping them from the queue.

Provider

static GstPadProbeReturn frame_probe(GstPad *pad, GstPadProbeInfo *info, gpointer user_data) {
  GstBuffer *buf = (GstBuffer *)info->data;
  NvDsBatchMeta *batch_meta = gst_buffer_get_nvds_batch_meta(buf);

  NvDsMetaList *l_frame = NULL;
  NvDsMetaList *l_obj = NULL;
  NvDsMetaList *l_user = NULL;
  
  GstMapInfo in_map_info;
  if (!gst_buffer_map(buf, &in_map_info, GST_MAP_READ)) { // for gpu memoery
    return GST_PAD_PROBE_OK;
  }

  NvBufSurface *surface = (NvBufSurface *)in_map_info.data;
  NvBufSurfaceMap(surface, -1, -1, NVBUF_MAP_READ);
  cv::cuda::GpuMat* copiedMat = new cv::cuda::GpuMat();

  for (l_frame = batch_meta->frame_meta_list; l_frame != NULL; l_frame = l_frame->next)
  {
    NvDsFrameMeta *frame_meta = (NvDsFrameMeta *)(l_frame->data);
    NvBufSurfaceParams *params = &surface->surfaceList[frame_meta->batch_id];

    if (frame_meta->source_id > 0)
      continue;
      
      cv::cuda::GpuMat nv12_mat = cv::cuda::GpuMat(
                        surface->surfaceList[frame_meta->batch_id].height, 
                        surface->surfaceList[frame_meta->batch_id].width, CV_8UC4, 
                        surface->surfaceList[frame_meta->batch_id].mappedAddr.addr[0], 
                        surface->surfaceList[frame_meta->batch_id].pitch);


      nv12_mat.copyTo(*copiedMat);
      cudaDeviceSynchronize();
      g_async_queue_push(queue, copiedMat);
      }
  NvBufSurfaceUnMap(surface, 0, 0);
  gst_buffer_unmap(buf, &in_map_info);
  return GST_PAD_PROBE_OK;
  }

Consumer

void consumer(GAsyncQueue *queue) {
  g_print("Consumer running");
  int count = 0;
  while (true)
  {
    cv::cuda::GpuMat* matToProcess = 
    (cv::cuda::GpuMat*)g_async_queue_timeout_pop(queue, 100 * 1000);

  if (matToProcess == nullptr || matToProcess->empty()) {
    std::cerr << "Error: Retrieved GpuMat is empty or null." << std::endl;
 
    }else{
    cv::Mat aa;
    matToProcess->download(aa);
    if (aa.empty()) {
        std::cerr << "Error: Downloaded Mat is empty." << std::endl;
    } else {
        std::string filename = "/workspace/tmpjpg/src0.jpg";
        cv::imwrite(filename, aa);
    }
 
    }

    g_usleep(100);
  }
}

I was able to pop/push and convert cuda::Mat and store images.
But images were incomplete.

The images were not fully loaded.
It seems like cuda buffer is overwritten even if after deep-copy cuda::Mat.

Is there any good solution…?
I am in desperate need of your assistance. I would greatly appreciate it if you could kindly help me.

Let’s narrow it down first. Could you try to save the image directly in the frame_probe function to rule out the effect of the asynchronism?

Hi!!
The first thing I checked is what you said.
Image saved directly in probe function is fine.

Waiting for the reply, there is few changes.

1 After I fixed timeout in function g_async_queue_timeout_pop, those gray pixels are gone and images look fine.

2 However new problem with g_async_queue raised.

Producer

    NvDsFrameMeta *frame_meta = (NvDsFrameMeta *)(l_frame->data);
    NvBufSurfaceParams *params = &surface->surfaceList[frame_meta->batch_id];

    if (frame_meta->source_id > 0)
      continue;
      
      cv::cuda::GpuMat nv12_mat = cv::cuda::GpuMat(
                        surface->surfaceList[frame_meta->batch_id].height, 
                        surface->surfaceList[frame_meta->batch_id].width, CV_8UC4, 
                        surface->surfaceList[frame_meta->batch_id].mappedAddr.addr[0], 
                        surface->surfaceList[frame_meta->batch_id].pitch);


      nv12_mat.copyTo(*copiedMat);
      cudaDeviceSynchronize();
      g_async_queue_push(queue, copiedMat);
      std::cout << "frame callback copiedMat: " << copiedMat << ", empty "<< aa.empty()<<  std::endl;

Consumer

void consumer(GAsyncQueue *queue) {
  g_print("Consumer running");
  int count = 0;
  while (true)
  {
    count++;

    cv::cuda::GpuMat* matToProcess = 
    (cv::cuda::GpuMat*)g_async_queue_timeout_pop(queue, 10);

    if (matToProcess == nullptr) {
      std::cerr << "Error: queue is now empty." << std::endl;
    }
    else if (matToProcess->empty()) {
        std::cerr << "Error: ptr is  null." << std::endl;
    }
    else{
      cv::Mat aa;
      matToProcess->download(aa);
      std::cout << "queue pop mat: " << matToProcess << std::endl;
      if (aa.empty()) {
      std::cerr << "Error: Downloaded Mat is empty." << std::endl;
      } else {
          // std::string filename = "/workspace/tmpjpg/src" + std::to_string(count)+".jpg";
          // cv::imwrite(filename, aa);
      }
      delete matToProcess;
    }

    g_usleep(100); 
  }
}

Consumer get cuda::Mat from queue.

  1. If queue is empty, pointer will be null.
  2. if cuda::Mat exists but raw data could be empty.
  3. finally if both queue and cuda::Mat data were not empty, I could save the image.

Results

Running...
Consumer running
queue pop mat: 0x7f1abc036890
Error: Downloaded Mat is empty.
frame callback copiedMat: 0x7f1abc01fec0, empty 0
queue pop mat: 0x7f1abc01fec0
queue pop mat: 0x7f1abc0d4340
Error: Downloaded Mat is empty.
frame callback copiedMat: 0x5636749f0c50, empty 0
queue pop mat: 0x5636749f0c50
frame callback copiedMat: 0x7f1abc01bd90, empty 0
queue pop mat: 0x7f1abc0af360
Error: Downloaded Mat is empty.
queue pop mat: 0x7f1abc01bd90
frame callback copiedMat: 0x7f1abc0ad600, empty 0
queue pop mat: 0x7f1abc022c30
Error: Downloaded Mat is empty.
queue pop mat: 0x7f1abc0ad600
frame callback copiedMat: 0x7f1abc106fe0, empty 0
queue pop mat: 0x7f1abc0bf760
Error: Downloaded Mat is empty.
queue pop mat: 0x7f1abc106fe0
queue pop mat: 0x7f1abc0291d0

As you can see, g_async_queue gives ptr which not pushed by producer.
For example, in the 3rd line, producer never pushed anything but consumer pops false pointer… and in the 7th line ptr0x7f1abc0d4340 was not pushed but consumer reads it.

I am not sure every stream, every images will be save correctly, but saved images were fine in single stream case.

I couldn’t understand the behavior of g_async_queue.
If queue is empty, it should give null ptr… right?
Could you please tell me what am I missing…

It could be a log loss issue. I wrote a simple demo to test the g_async_queue_* that didn’t have the problems you mentioned.
main.cpp (869 Bytes)
Makefile (560 Bytes)

If there are other questions about these APIs in the future, it is recommended to check the Glib official website.

Thank you for your explanation and example code.

I may not have explained my question clearly due to my lack of english skill. I didn’t ask because I didn’t know how to use the API, but because I encountered an unexpected return value. I wanted to confirm if there might be any specific behavior within DeepStream regarding access to CUDA memory.

As far as I know, the g_async_queue_timeout_pop function waits for the specified timeout duration and returns null if no data is available, and this is also stated in the documentation you shared. However, what I’m curious about is why a non-null address is returned after a timeout when the queue is in an empty state.

I suspect that since I am using CUDA memory, the data might have been popped from the queue, but the memory might have been freed internally by DeepStream after the object’s lifecycle ended, resulting in no valid data being available. Alternatively, I wonder if this might be related to compatibility issues caused by putting non-GObject items like cuda::mat into the queue.

Thank you for your assistance.

Since you are using cv::cuda::GpuMat, the lifecycle of the cuda memory should be managed by the OpenCV. This has nothing to do with the lifecycle of the cuda memory in DeepStream.

We don’t have much experience with this too, you can consult this question on the OpenCV forum.

Thank you for your advice.

I was my mistake that old code adding empty instance to queue was not deleted.

sorry for bothering you.

And I found out that cvtColor copies pixel value.

Thank you!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.