Segfault when swapping pointer to surface data

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) Both
• DeepStream Version 5.01
• JetPack Version (valid for Jetson only) L4T 32.4.4
• TensorRT Version 7.0.0 (GPU) / 7.1.3 (Jetson)
• NVIDIA GPU Driver Version (valid for GPU only) 450.102.04
• Issue Type( questions, new requirements, bugs) question
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing) N/A
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

I’m trying to swap pointers from the surface buffer so that I can do whatever I need to with the image data in a different/dedicated thread. Here’s my code from Deepstream that syncs the data for access by the CPU and passes the data to the next function:

    cudaError_t cuda_err;

    NvBufSurfTransformRect src_rect, dst_rect;
    NvBufSurface *surface = (NvBufSurface *) in_map_info.data;

    int batch_size = surface->batchSize;

    src_rect.top = 0;
    src_rect.left = 0;
    src_rect.width = (guint) surface->surfaceList[0].width;
    src_rect.height = (guint) surface->surfaceList[0].height;

    dst_rect.top = 0;
    dst_rect.left = 0;
    dst_rect.width = (guint) surface->surfaceList[0].width;
    dst_rect.height = (guint) surface->surfaceList[0].height;

    NvBufSurfTransformParams nvbufsurface_params;
    nvbufsurface_params.src_rect = &src_rect;
    nvbufsurface_params.dst_rect = &dst_rect;
    nvbufsurface_params.transform_flag = NVBUFSURF_TRANSFORM_CROP_SRC | NVBUFSURF_TRANSFORM_CROP_DST;
    nvbufsurface_params.transform_filter = NvBufSurfTransformInter_Default;

    NvBufSurfaceCreateParams nvbufsurface_create_params;

    nvbufsurface_create_params.gpuId = surface->gpuId;
    nvbufsurface_create_params.width = (gint) surface->surfaceList[0].width;
    nvbufsurface_create_params.height = (gint) surface->surfaceList[0].height;
    nvbufsurface_create_params.size = 0;
    nvbufsurface_create_params.layout = NVBUF_LAYOUT_PITCH;

#ifdef PLATFORM_TEGRA
    nvbufsurface_create_params.colorFormat = NVBUF_COLOR_FORMAT_RGBA;
    nvbufsurface_create_params.memType = NVBUF_MEM_SURFACE_ARRAY;
#else
    nvbufsurface_create_params.colorFormat = NVBUF_COLOR_FORMAT_RGBA;
    nvbufsurface_create_params.memType = NVBUF_MEM_CUDA_UNIFIED;
#endif

    cuda_err = cudaSetDevice(surface->gpuId);

    NvBufSurface *dst_surface = NULL;
    cudaStream_t cuda_stream;

    cuda_err = cudaStreamCreate(&cuda_stream);

    int create_result = NvBufSurfaceCreate(&dst_surface, batch_size, &nvbufsurface_create_params);

    NvBufSurfTransformConfigParams transform_config_params;
    NvBufSurfTransform_Error err;

    transform_config_params.compute_mode = NvBufSurfTransformCompute_Default;
    transform_config_params.gpu_id = surface->gpuId;
    transform_config_params.cuda_stream = cuda_stream;
    err = NvBufSurfTransformSetSessionParams(&transform_config_params);

    NvBufSurfaceMemSet(dst_surface, 0, 0, 0);
    err = NvBufSurfTransform(surface, dst_surface, &nvbufsurface_params);
    if (err != NvBufSurfTransformError_Success) {
        g_print("DS NvBufSurfTransform failed with error %d while converting buffer\n", err);
    }

    NvBufSurfaceMap(dst_surface, 0, 0, NVBUF_MAP_READ);
    NvBufSurfaceSyncForCpu(dst_surface, 0, 0);

    NvDsConfig *config = &appCtx->config;
    NvDsSourceConfig *sources = appCtx->config.multi_source_config;

    custom_parser (frame_meta->source_id, dst_surface->surfaceList[0].dataSize,
                     (std::byte *)dst_surface->surfaceList[0].mappedAddr.addr[0]);

    NvBufSurfaceUnMap(dst_surface, 0, 0);
    NvBufSurfaceDestroy(dst_surface);
    cudaStreamDestroy(cuda_stream);
    gst_buffer_unmap(buf, &in_map_info);

I’ve removed most of the code from the following samples as it’s not relevant to the issue.

This is a pointer swap function that’s similar to std::swap. I implemented it manually so that I could try to debug the issue.

void frameSwap(std::byte& _a, std::byte& _b) {
    std::byte *_tmp = &_a;
    _a = _b;
    _b = *_tmp;
}

Here’s the function that gets called from the deepstream side of things and swaps the pointers.

std::byte *imageData[NUM_THREADS] = {};
// initialization code for imageData is elsewhere on startup

void custom_parser (guint source_id, gint data_size, std::byte *new_frame) {
        // there is locking code here to manage thread safe access
        frameSwap(imageData[source_id], new_frame);
        // notify relevant dedicated thread of new frame data
}

The issue that I’m having is that this is working 100% perfectly on dGPU, but on Jetson I get a segfault when the dedicated thread accesses imageData[source_id]. I’m guessing it’s an issue with the memory type allocation on the Jetson and isn’t getting swapping correctly and it’s still getting freed from deepstream, but I’m unsure how I should be going about this.

To forestall any suggestions about copying the data to another pointer entirely (such as with memcpy), this isn’t an option as the copy operation is blocking and takes too much time with what I’m trying to do. Which is the reason I’m going after a pointer swap which takes only a fraction of a millisecond as opposed to over 10 milliseconds on the Jetson.

Any help is appreciated.

Hi @cbstryker ,
There is similar code in /opt/nvidia/deepstream/deepstream/sources/gst-plugins/gst-dsexample/gstdsexample.cpp

On Jetson, I added below change in the file and can write the data into test.png correctly.
This indicates that the dsexample->inter_buf->surfaceList[0].mappedAddr.addr[0] is CPU accessiable on Jetson.

Could you double check your code based on this dsexample code?

  /* Map the buffer so that it can be accessed by CPU */
  if (NvBufSurfaceMap (dsexample->inter_buf, 0, 0, NVBUF_MAP_READ) != 0){
    goto error;
  }

  /* Cache the mapped data for CPU access */
  NvBufSurfaceSyncForCpu (dsexample->inter_buf, 0, 0);

  /* Use openCV to remove padding and convert RGBA to BGR. Can be skipped if
   * algorithm can handle padded RGBA data. */
  in_mat =
      cv::Mat (dsexample->processing_height, dsexample->processing_width,
      CV_8UC4, dsexample->inter_buf->surfaceList[0].mappedAddr.addr[0],
      dsexample->inter_buf->surfaceList[0].pitch);

#if (CV_MAJOR_VERSION >= 4)
  cv::cvtColor (in_mat, *dsexample->cvmat, cv::COLOR_RGBA2BGR);
#else
  cv::cvtColor (in_mat, *dsexample->cvmat, CV_RGBA2BGR);
#endif

+      static int cnt = 0;
+      if (cnt++ == 20)
+            cv::imwrite ("test.png", in_mat);

  if (NvBufSurfaceUnMap (dsexample->inter_buf, 0, 0)){
    goto error;
  }

Hi mchi, thanks for the response. The issue I’m having isn’t with getting the frame data, I’m already do that successfully. The entire process works just fine dGPU, and on Jetson if I parse the image data in a blocking process (without separate threads) again there’s no issue (it’s just slow/er).

I actually managed to get around the issue by swapping the pointers directly within the function instead of an additional “frameSwap” function. I’m not sure why passing the pointers is an issue on Jetson but not on dGPU, but it seems to be.

This is the updated function:

void custom_parser (guint source_id, gint data_size, std::byte *new_frame) {
      // there is locking code here to manage thread safe access

      std::byte tmp = *imageData[source_id];
      *imageData[source_id] = *new_frame;
      new_frame = &tmp;

      // notify relevant dedicated thread of new frame data
}

This works perfectly for me. It swaps the pointers so that it takes only about 300-400 nano seconds versus 10ms for a memcpy.

For reference a single millisecond is 10’000 nano seconds. So this process is very efficient.