How to composite image on nvmultistreamtiler?

Hi Deepstream Team

I understand that the code for Gstnvmultistreamtiler plugin has not yet been provided.

So. I made new custom tiler plugin using OpenCV but this is so slow and heavy

Can you give us a hint about the tile method you used [ex. resize, composite]?

below is code of customtiler process
[4batch surface → 1batch surface]

static GstFlowReturn
process_tiler(GstCustomTiler *customtiler, NvBufSurface *insurface, NvDsBatchMeta *inbatch, NvBufSurface *outsurface, NvDsFrameMeta *outframemeta){
  //g_print("process tile\n");
  cv::Mat outMat;



  outMat = cv::Mat (outsurface->surfaceList[0].planeParams.height[0],outsurface->surfaceList[0].planeParams.width[0],
                  CV_8UC4, outsurface->surfaceList[0].mappedAddr.addr[0],
                  outsurface->surfaceList[0].planeParams.pitch[0]);        

  // g_print("output mat w = %d h = %d p = %d\n", outsurface->surfaceList[0].planeParams.width[0],
  //                                           outsurface->surfaceList[0].planeParams.height[0],
  //                                           outsurface->surfaceList[0].planeParams.pitch[0]);  

  if(customtiler->auto_scale){
    //int col = insurface->batchSize / 2;
    //int row = insurface->batchSize / 2;
    int col = 2;
    int row = 2;
    int tilewidth = customtiler->processing_width / col;
    int tileheight = customtiler->processing_height / row;          

    //g_print("input batch = %d col = %d row = %d tw = %d th = %d\n",insurface->batchSize, col, row, tilewidth, tileheight);
    int batch = 0;
    cv::Rect dest[4];
    cv::Mat inmat[4];
#if 0
    dest[0] = cv::Rect(0, 0, 1014, 720);
    dest[1] = cv::Rect(1014, 0, 256, 240);
    dest[2] = cv::Rect(1014, 240, 256, 240);
    dest[3] = cv::Rect(1014, 480, 256, 240);        
#else
    dest[0] = cv::Rect(0, 0, 640, 360);
    dest[1] = cv::Rect(640, 0, 640, 360);
    dest[2] = cv::Rect(0, 360, 640, 360);
    dest[3] = cv::Rect(640, 360, 640, 360);    
#endif
    ResizeMatGroup *group[4] = {0,};

    for(int r = 0 ; r < row; ++r){
      for(int c = 0 ; c < col; ++c){
        NvDsFrameMeta *frame_meta = nvds_get_nth_frame_meta(inbatch->frame_meta_list, batch);
        group[batch] = g_new(ResizeMatGroup, 1);
        //g_print("input mat batch%d w = %d h = %d p %d size %u addr %p\n",batch, insurface->surfaceList[batch].planeParams.width[0],

        //  insurface->surfaceList[batch].planeParams.height[0],
        //  insurface->surfaceList[batch].planeParams.pitch[0],
        //  insurface->surfaceList[batch].dataSize,
        //  insurface->surfaceList[batch].mappedAddr.addr[0]);
        if(insurface->surfaceList[batch].mappedAddr.addr[0] == nullptr){
          g_print("customtiler - detect nullpointer in tile process, ignore this frame batch%d\n", batch);
          ++batch;
          continue;
        }                    

        inmat[batch] =  cv::Mat (insurface->surfaceList[batch].planeParams.height[0],insurface->surfaceList[batch].planeParams.width[0],
                  CV_8UC4, insurface->surfaceList[batch].mappedAddr.addr[0],
                  insurface->surfaceList[batch].planeParams.pitch[0]);
        group[batch]->isEmpty = FALSE;        
        group[batch]->src = &inmat[batch];        
        group[batch]->dst = &outMat;        
        group[batch]->rect = dest[frame_meta->source_id];        
        char title[255];
        sprintf(title, "resize thread %d", batch);
        customtiler->processMatThread[batch] = g_thread_new(title, process_resize_mat, group[batch]);        
        //cv::Size s = inmat[batch].size();
        //g_print("mat width = %d height = %d\n", s.width, s.height);
        //g_print("resize\n");                
        //g_print("batch%d pindex = %d sourceid = %d\n", batch, frame_meta->pad_index, frame_meta->source_id);
        process_bbox(frame_meta, insurface->surfaceList[batch].planeParams.width[0], insurface->surfaceList[batch].planeParams.height[0],
              outframemeta, dest[frame_meta->source_id]);
        ++batch;
      }    
    }  



    for(int i = 0 ; i < 4 ; ++i){
      if(!group[i]->isEmpty){
        g_thread_join(customtiler->processMatThread[i]);
        //g_print("thrd%d=O ", i);

      }
      else
      {
        //g_print("thrd%d=X ", i);
      }      
      g_free(group[i]);
    }
    //g_print("\n");
  }

  else{

  }

  return GST_FLOW_OK;

}
static gpointer process_resize_mat(gpointer ptr){
  ResizeMatGroup *group = (_ResizeMatGroup*)ptr;
  cv::Mat resizedmat;
  cv::resize(*(group->src), resizedmat, cv::Size(group->rect.width, group->rect.height));        
  resizedmat.copyTo((*(group->dst))(group->rect));
  return NULL;
}

OpenCV is running on CPU. Gstnvmultistreamtiler is accelerated by GPU.

  1. I think too. so Is it made using the SDK functions you provide?
    In the dsexample, there is a resize example of each single batch frame, but there seems to be no example of changing nBatch to 1Batch.

  2. I’m making this plugin because costom-tile-config of nvmultistreamtiler feature release schedule is not coming out.
    I think many people are looking forward to it, but is there still no fixed schedule? Below is my question.
    Custom-tile-config property is support on DS 5,0?

Why does Gstnvmultistreamtiler not meet your requirement? What on earth do you want to customize?

You need to regenerate a new batch whose batch size is 1 and put your composed output into the NvBufSurface of the new batch. All data structures are clear. Dsexample is just a in-place transform plugin. You may refer to gst-nvvideotemplate for a transform plugin.

Why does Gstnvmultistreamtiler not meet your requirement? What on earth do you want to customize?
→ All I want is to support various layouts.

example
image

If you look at my Customtiler plugin source, you can see that the screen is composited using four custom Rects.

in deepstream plugin document I think custom-tile-config is necessary because it is thought to be for this function.

image

And thank you for suggesting gst-nvvideotemplate. I’ll look into it.

Hi.

I implemented the function using EglImage and cuda GpuMat, but it was also slow.

static GstFlowReturn
egl_process_tiler(GstCustomTiler *customtiler, NvBufSurface *insurface, NvDsBatchMeta *inbatch, NvBufSurface *outsurface, NvDsFrameMeta *outframemeta, gboolean *skipframes){
  //g_print("process tile\n");    
  CUresult status;
  CUeglFrame outEglFrame, inEglFrameTemp;
  CUgraphicsResource outpResource = NULL;
  CUgraphicsResource inpResourceTemp = NULL;
  cudaFree(0);
 
  status = cuGraphicsEGLRegisterImage(&outpResource,
    outsurface->surfaceList[0].mappedAddr.eglImage,
                CU_GRAPHICS_MAP_RESOURCE_FLAGS_NONE);
  status = cuGraphicsResourceGetMappedEglFrame(&outEglFrame, outpResource, 0, 0);
  //status = cuCtxSynchronize();    
  cv::cuda::GpuMat gpuOutMat(customtiler->output_height, customtiler->output_width, CV_8UC4, outEglFrame.frame.pPitch[0]);
  //status = cuCtxSynchronize();  
  if(customtiler->auto_scale){
    //int col = insurface->batchSize / 2;
    //int row = insurface->batchSize / 2;
    int col = 2;
    int row = 2;
    int tilewidth = customtiler->output_width / col;
    int tileheight = customtiler->output_height / row;          
    cv::cuda::GpuMat gpuInmat[insurface->batchSize];        
    cv::cuda::GpuMat gpuResizedMat[insurface->batchSize];    
    //g_print("input batch = %d col = %d row = %d tw = %d th = %d\n",insurface->batchSize, col, row, tilewidth, tileheight);
    int batch = 0;
    cv::Rect dest[4];
#if 1
    dest[0] = cv::Rect(0, 0, 1014, 720);
    dest[1] = cv::Rect(1014, 0, 256, 240);
    dest[2] = cv::Rect(1014, 240, 256, 240);
    dest[3] = cv::Rect(1014, 480, 256, 240);        
#else
    dest[0] = cv::Rect(0, 0, 640, 360);
    dest[1] = cv::Rect(640, 0, 640, 360);
    dest[2] = cv::Rect(0, 360, 640, 360);
    dest[3] = cv::Rect(640, 360, 640, 360);    
#endif  
   
    for(int r = 0 ; r < row; ++r){
      for(int c = 0 ; c < col; ++c){        
        if(skipframes[batch] == 1){
          g_print("customtiler - Skip Frames. Ignore this frame batch%d\n", batch);
          ++batch;
          continue;
        }
        NvDsFrameMeta *frame_meta = nvds_get_nth_frame_meta(inbatch->frame_meta_list, batch);
        // g_print("input mat batch%d w = %d h = %d p %d size %u addr %p\n",batch, insurface->surfaceList[batch].planeParams.width[0],
        //   insurface->surfaceList[batch].planeParams.height[0],
        //   insurface->surfaceList[batch].planeParams.pitch[0],
        //   insurface->surfaceList[batch].dataSize,
        //   insurface->surfaceList[batch].mappedAddr.addr[0]);    
       
        status = cuGraphicsEGLRegisterImage(&inpResourceTemp,
        insurface->surfaceList[batch].mappedAddr.eglImage,
                    CU_GRAPHICS_MAP_RESOURCE_FLAGS_NONE);
        if(status != 0)
        {
          g_print("customtiler - EglRegister Failed Ignore this frame batch%d\n", batch);
          ++batch;
          continue;
        }
        status = cuGraphicsResourceGetMappedEglFrame(&inEglFrameTemp, inpResourceTemp, 0, 0);
        if(status != 0)
        {
          g_print("customtiler - EglMap Failed Ignore this frame batch%d\n", batch);
          ++batch;
          continue;
        }
        //status = cuCtxSynchronize();
        //g_print("test2\n");
        gpuInmat[batch] = cv::cuda::GpuMat (customtiler->processing_height, customtiler->processing_width,
                  CV_8UC4, inEglFrameTemp.frame.pPitch[0]);                                      
        gpuResizedMat[batch] = cv::cuda::GpuMat(dest[frame_meta->source_id].height, dest[frame_meta->source_id].width, CV_8UC4);
        cv::cuda::resize(gpuInmat[batch], gpuResizedMat[batch], cv::Size(dest[frame_meta->source_id].width, dest[frame_meta->source_id].height));
        gpuResizedMat[batch].copyTo(gpuOutMat(dest[frame_meta->source_id]));
        //status = cuCtxSynchronize();              
        status = cuGraphicsUnregisterResource(inpResourceTemp);
        process_bbox(frame_meta, customtiler->processing_width, customtiler->processing_height,
              outframemeta, dest[frame_meta->source_id]);
        ++batch;
      }    
    }      
    status = cuGraphicsUnregisterResource(outpResource);
  }
  else{
  }
  return GST_FLOW_OK;
}
  1. The above code operates without problems, but the processing speed is very slow compared to nvmultistreamtiler.
    Can you please give me some advice of speed up for composite?

  2. while looking at API documents, I found the ‘NvBufSurfTransformComposite’ function.
    I think nvmultistreamtiler is using this function.

If correct, I would like to get a hint on what to do to get a processing speed similar to that of nvmultistreamtiler.

I really need a custom tile function that can be positioned.

thanks.

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)
• DeepStream Version
• JetPack Version (valid for Jetson only)
• TensorRT Version
• NVIDIA GPU Driver Version (valid for GPU only)
• Issue Type( questions, new requirements, bugs)
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

• Hardware Platform (Jetson / GPU) Jetson Nano Module
• DeepStream Version 5.0
• JetPack Version (valid for Jetson only) 4.4
• TensorRT Version 7
• Issue Type( questions, new requirements, bugs) questions
**• How to reproduce the issue ? ** not bug
• Requirement details OpenCV 4.4.0 With CUDA 10.2 , customtile plugin(based on dsexample)

Due to out driver & memory space problems, it is impossible to update to jetpack 4.6 now.

Yes. This API can help. NVIDIA DeepStream SDK API Reference: NvBufSurfTransform Types and Functions | NVIDIA Docs

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.