App crashes when using NvBufSurfTransformComposite to transform and composite buffers

Hi Team,

I tried to add a probe function to src pad of nvstreammux, to crop the buffer from nvstreammux and composite this cropped buffer with the original nvstreammux buffer, so that the nvinfer plugin will infer on the composited buffer. I implemented this function in deepstream-test3-app and it works well. But when I implemented it in deepstream-app, I am able to continuously transfer and composite the frames for around 959 frames and then it crashes reporting the below errors. The below errors change randomly each run. The only clue that I can get is the error “Could not allocate internal buffer for buffer conversion” is triggered by NvBufSurfaceCreate() function.

Any help would be appreciated. Thanks.

Error 1:

Cuda failure: status=700
Error: Could not allocate internal buffer for buffer conversion
Go to error
GPUassert: an illegal memory access was encountered src/modules/cuDCF/cudaCropScaleInTexture2D.cu 865
0:00:38.617924248 21758 0x55cc1d0e48f0 ERROR nvinfer gstnvinfer.cpp:987:get_converted_buffer:<primary_gie_classifier> cudaMemset2DAsync failed with error cudaErrorIllegalAddress while converting buffer
cuGraphicsMapResources failed with error(700) gst_eglglessink_cuda_buffer_copy
0:00:38.617941290 21758 0x55cc1d0e48f0 WARN nvinfer gstnvinfer.cpp:1246:gst_nvinfer_process_full_frame:<primary_gie_classifier> error: Buffer conversion failed
ERROR from primary_gie_classifier: Buffer conversion failed
Debug info: gstnvinfer.cpp(1246): gst_nvinfer_process_full_frame (): /GstPipeline:pipeline/GstBin:primary_gie_bin/GstNvInfer:primary_gie_classifier
Segmentation fault (core dumped)

Error 2:

Cuda failure: status=700
Error: Could not allocate internal buffer for buffer conversion
Go to error
GPUassert: an illegal memory access was encountered src/modules/cuDCF/cudaCropScaleInTexture2D.cu 352
(…/inc/nvcudautils) Error ResourceError: e: an illegal memory access was encountered (cudaErrorIllegalAddress) (propagating from /vpi/ext/nvcudautils/src/AllocMem.cpp, function freeMem(), line 283)
(…/inc/nvcudautils) Error ResourceError: (propagating from /vpi/ext/nvcudautils/inc/nvcudautils/detail/…/AllocMem.h, function operator()(), line 45)
(…/inc/nvcudautils) Error ResourceError: e: an illegal memory access was encountered (cudaErrorIllegalAddress) (propagating from /vpi/ext/nvcudautils/src/AllocMem.cpp, function freeMem(), line 283)
(…/inc/nvcudautils) Error ResourceError: (propagating from /vpi/ext/nvcudautils/inc/nvcudautils/detail/…/AllocMem.h, function operator()(), line 60)
(…/inc/nvcudautils) Error ResourceError: e: an illegal memory access was encountered (cudaErrorIllegalAddress) (propagating from /vpi/ext/nvcudautils/src/AllocMem.cpp, function freeMem(), line 283)
(…/inc/nvcudautils) Error ResourceError: (propagating from /vpi/ext/nvcudautils/inc/nvcudautils/detail/…/AllocMem.h, function operator()(), line 60)
…/…/src/hb-object-private.hh:154: Type* hb_object_reference(Type*) [with Type = hb_unicode_funcs_t]: Assertion `hb_object_is_valid (obj)’ failed.
Aborted (core dumped)

Error 3:

Cuda failure: status=700
Error: Could not allocate internal buffer for buffer conversion
Go to error
cuMemcpy2D failed with error(700) gst_eglglessink_cuda_buffer_copy
0:00:35.050961321 26693 0x55aa630f7cf0 ERROR nvinfer gstnvinfer.cpp:511:gst_nvinfer_logger:<primary_gie_classifier> NvDsInferContext[UID 1]:queueInputBatch(): Failed to make stream wait on event(cudaErrorIllegalAddress)
0:00:35.050977356 26693 0x55aa630f7cf0 WARN nvinfer gstnvinfer.cpp:1098:gst_nvinfer_input_queue_loop:<primary_gie_classifier> error: Failed to queue input batch for inferencing
ERROR from primary_gie_classifier: Failed to queue input batch for inferencing
Debug info: gstnvinfer.cpp(1098): gst_nvinfer_input_queue_loop (): /GstPipeline:pipeline/GstBin:primary_gie_bin/GstNvInfer:primary_gie_classifier
Quitting
Segmentation fault (core dumped)

My core implementation of this probe function:

gint processing_width = appCtx->config.streammux_config.pipeline_width;
gint processing_height = appCtx->config.streammux_config.pipeline_height;

memset (&in_map_info, 0, sizeof (in_map_info));
if (!gst_buffer_map (buf, &in_map_info, GST_MAP_READ)) {
  g_print ("Error: Failed to map gst buffer\n");
  goto error;
}

surface = (NvBufSurface *) in_map_info.data;

batch_meta = gst_buffer_get_nvds_batch_meta (buf);
if (batch_meta == NULL) {
  g_print("NvDsBatchMeta not found for input buffer.\n");
  goto error;
}

for (l_frame = batch_meta->frame_meta_list; l_frame != NULL;
  l_frame = l_frame->next)
{
  frame_meta = (NvDsFrameMeta *) (l_frame->data);
  NvOSD_RectParams rect_params;

  // Scale the entire frame to processing resolution
  rect_params.left = 0;
  rect_params.top = 0;
  rect_params.width = (guint) surface->surfaceList[idx].width;
  rect_params.height = (guint) surface->surfaceList[idx].height;

  // Scale and convert the frame
  NvBufSurfTransform_Error err;
  NvBufSurfTransformConfigParams transform_config_params;
  NvBufSurfTransformParams transform_params;
  NvBufSurfTransformRect src_rect;
  NvBufSurfTransformRect dst_rect;
  NvBufSurface ip_surf;
  ip_surf = *surface;
  NvBufSurface *dst_surface = NULL;

  ip_surf.numFilled = ip_surf.batchSize = 1;
  ip_surf.surfaceList = &(surface->surfaceList[idx]);

  gint src_left = GST_ROUND_UP_2(rect_params.left);
  gint src_top = GST_ROUND_UP_2(rect_params.top);
  gint src_width = GST_ROUND_DOWN_2(rect_params.width);
  gint src_height = GST_ROUND_DOWN_2(rect_params.height);

  // Maintain aspect ratio
  double hdest = processing_width * src_height / (double) src_width;
  double wdest = processing_height * src_width / (double) src_height;
  guint dest_width, dest_height;

  if (hdest <= processing_height) {
    dest_width = processing_width;
    dest_height = hdest;
  } else {
    dest_width = wdest;
    dest_height = processing_height;
  }

  cuda_err = cudaSetDevice (surface->gpuId);
  cudaStream_t cuda_stream;
  cuda_err = cudaStreamCreate (&cuda_stream);

  // Configure transform session parameters for the transformation
  transform_config_params.compute_mode = NvBufSurfTransformCompute_Default; 
  transform_config_params.gpu_id = surface->gpuId;
  transform_config_params.cuda_stream = cuda_stream;

  // Set the transform session parameters for the conversions executed in this thread.
  err = NvBufSurfTransformSetSessionParams (&transform_config_params);
  if (err != NvBufSurfTransformError_Success) {
    g_print("NvBufSurfTransformSetSessionParams failed with error %d\n", err);
    goto error;
  }

  // Calculate scaling ratio while maintaining aspect ratio
  ratio = MIN (1.0 * dest_width / src_width, 1.0 * dest_height / src_height);

  if ((rect_params.width == 0) || (rect_params.height == 0)) {
   g_print("%s:crop_rect_params dimensions are zero\n",__func__);
    goto error;
  }

  // Set the transform ROIs for source and destination
  src_rect.top = 1150;
  src_rect.left = 1650;
  src_rect.width = 450;
  src_rect.height = 154;
  dst_rect.top = 0;
  dst_rect.left = 0;
  dst_rect.width = 989;
  dst_rect.height = 637;

  // Set the transform parameters
  transform_params.src_rect = &src_rect;
  transform_params.dst_rect = &dst_rect;
  transform_params.transform_flag =
    NVBUFSURF_TRANSFORM_FILTER | NVBUFSURF_TRANSFORM_CROP_SRC |
      NVBUFSURF_TRANSFORM_CROP_DST;
  transform_params.transform_filter = NvBufSurfTransformInter_Default;
  transform_params.transform_flip = NvBufSurfTransform_None;

  NvBufSurfaceCreateParams create_params;
  int batch_size= surface->batchSize; 
  create_params.gpuId  = surface->gpuId;
  create_params.width  = processing_width;
  create_params.height = processing_height;
  create_params.size = 0;
  create_params.isContiguous = true;
  create_params.colorFormat = NVBUF_COLOR_FORMAT_RGBA;
  create_params.layout = NVBUF_LAYOUT_PITCH;     
  create_params.memType = NVBUF_MEM_CUDA_UNIFIED;
  NvBufSurfaceCreate(&dst_surface, batch_size, &create_params);
  if (NvBufSurfaceCreate (&dst_surface, batch_size, &create_params) != 0) {
    g_print ("Error: Could not allocate internal buffer for buffer conversion\n");
    goto error;
  }

  //Memset the memory
  NvBufSurfaceMemSet (dst_surface, 0, 0, 0); 

  g_print("Scaling and converting input buffer for frame %d\n", frame_meta->frame_num);

  // Transformation scaling+format conversion
  err = NvBufSurfTransform (&ip_surf, dst_surface, &transform_params);
  if (err != NvBufSurfTransformError_Success) {
    g_print("NvBufSurfTransform failed with error %d while converting buffer\n", err);
    goto error;
  }

  // Set the composite parameters
  NvBufSurfTransformCompositeParams composite_params;
  NvBufSurfTransformRect dst_com_rect;
  dst_com_rect.top = 0;
  dst_com_rect.left = 0;
  dst_com_rect.width = 989;
  dst_com_rect.height = 637;
  composite_params.src_comp_rect = &dst_rect;
  composite_params.dst_comp_rect = &dst_com_rect;
  composite_params.composite_flag = NVBUFSURF_TRANSFORM_COMPOSITE;
  composite_params.input_buf_count = 2;

  // Composite input surface and cropped surface into one buffer
  err = NvBufSurfTransformComposite (dst_surface, &ip_surf, &composite_params);
  if (err != NvBufSurfTransformError_Success) {
    g_print("NvBufSurfTransformComposite failed with error %d while compositing buffers\n", err);
    goto error;
  }
  
  // Map the buffer so that it can be accessed by CPU
  if (NvBufSurfaceMap (dst_surface, 0, 0, NVBUF_MAP_READ) != 0){
    goto error;
  }

  // Cache the mapped data for CPU access
  NvBufSurfaceSyncForCpu (dst_surface, 0, 0);

  // Free resources
  if (NvBufSurfaceUnMap (dst_surface, 0, 0)){
    goto error;
  }
  if (NvBufSurfaceUnMap(&ip_surf, 0, 0)) {
    goto error;
  };
  NvBufSurfaceDestroy (dst_surface);
  cudaStreamDestroy (cuda_stream);
  gst_buffer_unmap (buf, &in_map_info);
}

Please help. Thanks.

Any feedback and help please? Thanks!

Checking internally, thanks for the patience.

Thanks, looking forward to your feedback then.

Hi,
Fix needed:
we see in your code, Buffer is created twice, which caused a mem leak,
NvBufSurfaceCreate (&dst_surface, batch_size, &create_params);
if ( NvBufSurfaceCreate (&dst_surface, batch_size, &create_params) != 0) {
g_print (“Error: Could not allocate internal buffer for buffer conversion\n”);
goto error;
}
and gst buffer unmap should be outside the loop, If you are trying more than one sources buffer will be unmapped “loop count” times. it will work for single source but for multisource the code will face issues.

Enhancements: you can create the buffer and the cuda stream just once and reusing it, instead of creating and destroying in each buffer probe call.
you can also achieve this in single composite call rather than having transform and composite, would need a memcpy to original buffer.

Hi amycao,

Thanks a lot! No more error after deleted one of the NvBufSurfaceCreate now.

As for multisource case, as you said I moved gst_buffer_unmap (buf, &in_map_info) outside the frame loop, but it still crashed and reported segmentation fault error. Any other things I did wrong?

As for enhancements, if I shall not create and destroy in each probe call, where can I create the buffer and do the composite on the nvstreammux buffer? Can you tell me the function name to put my tranform code in?

Also, by “you can also achieve this in single composite call rather than having transform and composite”, do you mean I can achieve cropping and compositing by only using NvBufSurfTransformComposite? Since there is no further guide on this function, can you help advise how to use this function to achieve both cropping and compositing?

Any help would be appreciated. Thanks again.

For segfault, if possible you can share the code for a local try which can compile and run within nvidia environments? We do not see any error apart from what were pointed out.

Hi amycao,

I just added -t in command and no more error now.

Can you please help elaborate the enhancement part you mentioned as below?

As for enhancements, if I shall not create and destroy in each probe call, where can I create the buffer and do the composite on the nvstreammux buffer? Can you tell me the function name to put my tranform code in?

Also, by “ you can also achieve this in single composite call rather than having transform and composite ”, do you mean I can achieve cropping and compositing by only using NvBufSurfTransformComposite? Since there is no further guide on this function, can you help advise how to use this function to achieve both cropping and compositing?

Thanks!