Segfault using NvdsPreprocess custom TensorPreparation

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU): Laptop RTX 4070
• DeepStream Version: 7.1
• TensorRT Version: 10.3 using ngc deepstream container
• NVIDIA GPU Driver Version (valid for GPU only): 570
• Issue Type( questions, new requirements, bugs): Question
• How to reproduce the issue ?
I am preparing tensors for secondary inference using masks from previous instance segmentation to remove background for detected objects. This is done using two self written Cuda Kernels, one Scaling the mask to the correct dimensions so it can be applied to unit->converted_frame_buf which is then applied using a edited version of the Conversion Kernels provided in nvdspreprocess_conversion.cu taking the masks value in account for each pixel. The data is passt the exact same way as done in example library and also converted the exact same way.

Here is my question:
Is the CustomAsyncTransformation as in the example library always needed? As far as i understand this creates a Tensor from GstBuffer Datastructure and needs to be implemented in every custom preprocess library

Second How should nvdspreprocess be configured for my pipeline
src->nvstreammux->pgie->nvdspreprocess->sgie
So that everything matches: expected batch size, format, using meta-data-tensor as input etc. Using the examples did not give me a good understanding, some say if nvdspreprocess is present sgie has to be configured for primary inference when input-from-meta-data=1 is set. How do batchsizes need to be specified, my suspicion is that my sgie is expecting a full batch, and exits with segfault when said batchsize is not given as input.

backtrace of segfault:

Thread 23 "python3" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff9adde640 (LWP 908439)]
0x00007ffff7ce93fe in free () from /usr/lib/x86_64-linux-gnu/libc.so.6
(gdb) bt full
#0  0x00007ffff7ce93fe in free () at /usr/lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff716d87f in release_obj_meta () at /opt/nvidia/deepstream/deepstream/lib/libnvds_meta.so
#2  0x00007ffff716d65b in nvds_clear_meta_list () at /opt/nvidia/deepstream/deepstream/lib/libnvds_meta.so
#3  0x00007ffff716d6f5 in release_frame_meta () at /opt/nvidia/deepstream/deepstream/lib/libnvds_meta.so
#4  0x00007ffff716cffd in nvds_destroy_meta_pool () at /opt/nvidia/deepstream/deepstream/lib/libnvds_meta.so
#5  0x00007ffff716bda5 in nvds_destroy_batch_meta () at /opt/nvidia/deepstream/deepstream/lib/libnvds_meta.so
#6  0x00007ffff6f38139 in  () at /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0
#7  0x00007ffff51e1acf in  () at /usr/lib/x86_64-linux-gnu/libgstbase-1.0.so.0
#8  0x00007ffff51e114c in  () at /usr/lib/x86_64-linux-gnu/libgstbase-1.0.so.0
#9  0x00007ffff6f7986d in  () at /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0
#10 0x00007ffff6f7ce09 in  () at /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0
#11 0x00007ffff6f7d22e in gst_pad_push () at /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0
#12 0x00007fffeb3cf8d3 in  () at /usr/lib/x86_64-linux-gnu/gstreamer-1.0/deepstream/libnvdsgst_infer.so
#13 0x00007ffff773bac1 in g_thread_proxy (data=0x555559540550) at ../glib/gthread.c:831
        thread = 0x555559540550
        __func__ = "g_thread_proxy"
#14 0x00007ffff7cd8ac3 in  () at /usr/lib/x86_64-linux-gnu/libc.so.6
#15 0x00007ffff7d6a850 in  () at /usr/lib/x86_64-linux-gnu/libc.so.6

This is my tensorpreparation function:

    NvDsPreProcessStatus CustomTensorPreparation(CustomCtx *ctx, NvDsPreProcessBatch *batch, NvDsPreProcessCustomBuf *&buf, CustomTensorParams &tensorParam, NvDsPreProcessAcquirer *acquirer)
    {
        NvDsPreProcessStatus status = NVDSPREPROCESS_TENSOR_NOT_READY;
        std::cout << "=== CustomTensorPreparation STARTED ===" << std::endl;
        std::cout << "Batch size: " << (batch ? batch->units.size() : 0) << std::endl;
        std::cout << "TensorParam network color format: " << tensorParam.params.network_color_format << std::endl;

        // 2. Acquire a buffer from the tensor pool
        buf = acquirer->acquire();
        if (!buf || !buf->memory_ptr)
        {
            std::cerr<< "ERROR: Failed to acquire buffer from tensor pool or null memory pointer"<<std::endl;
            return NVDSPREPROCESS_RESOURCE_ERROR;
        }

        std::cout << "Buffer acquired successfully" << std::endl;

        // 3. Cutout objects with error handling
        std::cout << "Calling cutout_objects..." << std::endl;
        status = ctx->cutout_impl->cutout_objects(batch, buf->memory_ptr, tensorParam);
        std::cout << "cutout_objects returned status: " << status << std::endl;

        if (status != NVDSPREPROCESS_SUCCESS)
        {
            std::cerr<< "ERROR: CustomTensorPreparation: cutout_objects failed error code "<<status<<std::endl;
            acquirer->release(buf);
            return status;
        }

        // 4. Sync CUDA stream with error handling
        std::cout << "Syncing CUDA stream..." << std::endl;
        status = ctx->cutout_impl->syncStream();
        std::cout << "syncStream returned status: " << status << std::endl;

        if (status != NVDSPREPROCESS_SUCCESS)
        {
            std::cerr<< "ERROR: CustomTensorPreparation: syncStream failed error code "<<status<<std::endl;
            acquirer->release(buf);
            return status;
        }

        tensorParam.params.network_input_shape[0] = (int)batch->units.size();

        std::cout << "=== CustomTensorPreparation COMPLETED SUCCESSFULLY ===" << std::endl;
        return status;
    }

The cutout_objects function:

NvDsPreProcessStatus CustomObjectCutoutImpl::cutout_objects(
    NvDsPreProcessBatch *batch, void *&devBuf, CustomTensorParams &tensorParam)
{
    cudaError_t err = cudaSuccess;

    if (!batch || !devBuf)
    {
        std::cerr << "Invalid input parameters" << std::endl;
        return NVDSPREPROCESS_CUSTOM_LIB_FAILED;
    }

    unsigned int batch_size = batch->units.size();
    if (batch_size > m_BatchSize)
    {

        std::cerr << "Batch size exceeds allocated resources" << std::endl;
        return NVDSPREPROCESS_CUSTOM_LIB_FAILED;
    }

    for (unsigned int i = 0; i < batch_size; i++)
    {
        NvDsPreProcessUnit *unit = &batch->units[i];
        if (!unit || !unit->roi_meta.object_meta)
        {
            std::cerr << "Invalid unit or object metadata" << std::endl;
            continue;
        }

        // original rect_roi
        NvDsRoiMeta * roi_meta = &unit->roi_meta;
        NvOSD_RectParams * roi = &roi_meta->roi;
        // 320x320 mask for nvinfer input 1280x1280
        NvOSD_MaskParams *mask_params = &unit->roi_meta.object_meta->mask_params;
        // Before calling TransformMask, add verification:
        if (!mask_params || !mask_params->data)
        {
            std::cerr<< "ERROR: Invalid mask_params or mask data"<<std::endl;
            return NVDSPREPROCESS_CUDA_ERROR;
        }
        if (mask_params->width != m_MaskWidth || mask_params->height != m_MaskHeight)
        {
            std::cerr<< "ERROR: Invalid mask dimensions"<<std::endl;
            return NVDSPREPROCESS_CUDA_ERROR;
        }
        std::size_t mask_size = m_MaskWidth * m_MaskHeight * sizeof(float);
        float *mask = ((float *)m_Mask.get() ? m_Mask->ptr<float>() : nullptr);
        if (!mask)
        {
            std::cerr<< "ERROR: Invalid mask pointer"<<std::endl;
            return NVDSPREPROCESS_CUDA_ERROR;
        }
        float* d_mask_data = mask + i * m_MaskWidth * m_MaskHeight;
        std::cout << "Copying mask data to device" << std::endl;
        err = cudaMemcpyAsync(d_mask_data, mask_params->data, mask_size,
                              cudaMemcpyHostToDevice, *m_PreProcessStream);
        if (err != cudaSuccess) {
            std::cerr<< "ERROR: CustomObjectCutoutImpl: cutout_objects: Failed to copy mask data to device: "<<cudaGetErrorString(err)<<std::endl;
            return NVDSPREPROCESS_CUDA_ERROR;
        }

        float *d_scaled_mask_data = ((float *)m_ScaledMask.get() ? m_ScaledMask->ptr<float>() : nullptr) + i * m_NetworkSize.width * m_NetworkSize.height;
        if (!d_scaled_mask_data) {
            std::cerr<< "ERROR: Invalid scaled mask pointer"<<std::endl;
            return NVDSPREPROCESS_CUDA_ERROR;
        }
        // shape of mask input
        unsigned int in_mask_width = mask_params->width; // 160
        unsigned int in_mask_height = mask_params->height; // 160
        // shape of desired output
        unsigned int out_width = m_NetworkSize.width; // 640
        unsigned int out_height = m_NetworkSize.height; // 640
        // calculate output roi to scale in roi to
        float roi_crop_scale_x = roi_meta->scale_ratio_x;
        float roi_crop_scale_y = roi_meta->scale_ratio_y;
        unsigned int roi_out_width = (unsigned int)((float)roi->width * (float)roi_crop_scale_x);
        unsigned int roi_out_height = (unsigned int)((float)roi->height * (float)roi_crop_scale_y);
        if (roi_out_width > out_width)
        {
            roi_out_width = out_width;
        }
        if (roi_out_height > out_height)
        {
            roi_out_height = out_height;
        }
        unsigned int roi_out_left = roi_meta->offset_left;
        unsigned int roi_out_top = roi_meta->offset_top;
        // scale ratio needed for scaling from in shape to out shape
        float scale_ratio_x = float(in_mask_width) / float(roi_out_width);
        float scale_ratio_y = float(in_mask_height) / float(roi_out_height);

        if (!m_ScaledMask || !m_ScaledMask->ptr() || !m_PreProcessStream || !m_PreProcessStream->ptr()) {
            std::cerr<< "ERROR: Invalid CUDA resources"<<std::endl;
            return NVDSPREPROCESS_CUDA_ERROR;
        }
        std::cout << "Transforming mask" << std::endl;
        err = TransformMask(
            d_scaled_mask_data,
            d_mask_data,
            in_mask_width,
            in_mask_height,
            out_width,
            out_height,
            roi_out_left,
            roi_out_top,
            roi_out_width,
            roi_out_height,
            scale_ratio_x,
            scale_ratio_y,
            m_PreProcessStream->ptr());
        if (err != cudaSuccess)
        {
            std::cerr<< "TransformMask failed with err "<<(int)err<<" : "<<cudaGetErrorName(err)<<std::endl;
            return NVDSPREPROCESS_CUDA_ERROR;
        }
        void *outPtr = (void*)((uint8_t*)devBuf + i * m_NetworkSize.channels
          * m_NetworkSize.width * m_NetworkSize.height
          * bytesPerElement(tensorParam.params.data_type));
        err = ApplyMaskAndConvert_C4ToL3Half((half *)outPtr, (unsigned char *)unit->converted_frame_ptr, d_scaled_mask_data, out_width, out_height, batch->pitch, *m_PreProcessStream);
        if (err != cudaSuccess)
        {
            std::cerr<< "ApplyMaskToConvertedBuffer failed with err "<<(int)err<<" : "<<cudaGetErrorName(err)<<std::endl;
            return NVDSPREPROCESS_CUDA_ERROR;
        }
        err = cudaStreamSynchronize(*m_PreProcessStream);
        if (err != cudaSuccess)
        {
            std::cerr<< "cudaStreamSynchronize failed with err "<<(int)err<<" : "<<cudaGetErrorName(err)<<std::endl;
            return NVDSPREPROCESS_CUDA_ERROR;
        }
        // Add error checking after each CUDA operation
        if (cudaGetLastError() != cudaSuccess) {
            std::cerr << "CUDA error during processing" << std::endl;
            return NVDSPREPROCESS_CUDA_ERROR;
        }
    }

    return NVDSPREPROCESS_SUCCESS;
}

I already tested this function using memcopy and saving results in devBuf to ppm files, the images are perfectly cutout, masks correctly applied. Also for multiple iterations this is at least executed correctly i cant see the results from secondary inference but the function is executed correctly for some time until segfault happens

Also for some reason the converted_frame_buf is in RGBA but nowhere this format is specified in my configs
config_preprocess_secondary.txt (2.8 KB)
config_infer_primary_yoloV8_seg.txt (2.0 KB)
config_infer_secondary_yoloV8_seg.txt (2.1 KB)

I Just tested my setup with the config files above with the provided nvdspreprocess custom library and the segfault still happens, so it seems the configuration must be wrong. Pipeline urlsrc → nvstreammux → pgie → nvdspreprocess → pgie ->nvvidconv → nvosd → sink

I have added debugging flags to libgstnvinfer and i print the uniqueid of the nvinfer instance. The pipeline crashes inside nvinfer outputloop at call
GstFlowReturn flow_ret = gst_pad_push (GST_BASE_TRANSFORM_SRC_PAD (nvinfer), batch->inbuf);
This is the backtrace:

DBUG nvinfer 3: received batch_meta with num_frames_in_batch:1
msg from unique id nvinfer: 3
msg from unique id nvinfer: 3

Thread 23 "python3" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff6adde640 (LWP 1229644)]
0x00007ffff7ce93fe in free () from /usr/lib/x86_64-linux-gnu/libc.so.6
(gdb) bt f
#0  0x00007ffff7ce93fe in free () at /usr/lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff716d87f in release_obj_meta () at /opt/nvidia/deepstream/deepstream/lib/libnvds_meta.so
#2  0x00007ffff716d65b in nvds_clear_meta_list () at /opt/nvidia/deepstream/deepstream/lib/libnvds_meta.so
#3  0x00007ffff716d6f5 in release_frame_meta () at /opt/nvidia/deepstream/deepstream/lib/libnvds_meta.so
#4  0x00007ffff716cffd in nvds_destroy_meta_pool () at /opt/nvidia/deepstream/deepstream/lib/libnvds_meta.so
#5  0x00007ffff716bda5 in nvds_destroy_batch_meta () at /opt/nvidia/deepstream/deepstream/lib/libnvds_meta.so
#6  0x00007ffff6f38139 in  () at /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0
#7  0x00007ffff51e1acf in  () at /usr/lib/x86_64-linux-gnu/libgstbase-1.0.so.0
#8  0x00007ffff51e114c in  () at /usr/lib/x86_64-linux-gnu/libgstbase-1.0.so.0
#9  0x00007ffff6f7986d in  () at /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0
#10 0x00007ffff6f7ce09 in  () at /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0
#11 0x00007ffff6f7d22e in gst_pad_push () at /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0
#12 0x00007fffeb39ac99 in gst_nvinfer_output_loop(gpointer) (data=0x55555644ad10) at gstnvinfer.cpp:2418
        flow_ret = GST_FLOW_OK
        batch_output = 0x0
        nvdsinfer_ctx = std::shared_ptr<INvDsInferContext> (use count 2, weak count 0) = {
          get() = 0x55555707b460
        }
        batch = std::unique_ptr<GstNvInferBatch> = {
          get() = 0x7ffedc0e43c0
        }
        tensor_deleter = {<No data fields>}
        tensor_out_object = std::unique_ptr<GstNvInferTensorOutputObject> = {
          get() = 0x0
        }
        nvinfer = 0x55555644ad10 [GstNvInfer]
        impl = 0x55555644b8e0
        init_params = 0x5555565b9170
        status = NVDSINFER_SUCCESS
        eventAttrib = {
          version = 3,
          size = 48,
          category = 0,
          colorType = 1,
          color = 4284907198,
          payloadType = 0,
          reserved0 = 0,
          payload = {
            ullValue = 0,
            llValue = 0,
            dValue = 0,
            uiValue = 0,
            iValue = 0,
            fValue = 0
          },
          messageType = 1,
          message = {
            ascii = 0x7fff5c02e5d0 "dequeueOutputAndAttachMeta batch_num=107",
            unicode = 0x7fff5c02e5d0 L"\x75716564\x4f657565\x75707475\x646e4174\x61747441\x654d6863\x62206174\x68637461\x6d756e5f\x3730313d",
            registered = 0x7fff5c02e5d0
          }
        }
        nvtx_str = "dequeueOutputAndAttachMeta batch_num=107"
        cudaReturn = cudaSuccess
        uniqueid = 3
        __FUNCTION__ = "gst_nvinfer_output_loop"
        locker = {
          m = @0x55555644af70,
          locked = false
        }
#13 0x00007ffff773bac1 in g_thread_proxy (data=0x555559559150) at ../glib/gthread.c:831
        thread = 0x555559559150
        __func__ = "g_thread_proxy"
#14 0x00007ffff7cd8ac3 in  () at /usr/lib/x86_64-linux-gnu/libc.so.6
#15 0x00007ffff7d6a850 in  () at /usr/lib/x86_64-linux-gnu/libc.so.6
(gdb) 

I believe something is still wrong with my configuration but after multiple days of trying different configs i cant get it right. By the way, i am using two YoloV8 Segmentation models After each other.

So is your pgie a segmentation model? We cannot support this scenairo. you can refer to our Guide.

Preprocess with SGIE mode is used to process the detected objects within the given ROI/Frame on which we want to perform secondary inferencing.

yes, my pipeline is as following:
src → nvstreammux → nvinfer(primary mode: instance segmentation) → nvdspreprocess(custom library, receiving cropped objects scaled and padded to input size, my custom lib is only applying the mask additionally) → nvinfer(primary mode?: does detection on the cutouts, that should be saved inside the NvDsBatchMeta attached by nvdspreprocess) → nvvidconv → osd → displaysink

As is said the custom lib i tested, it is working and writing the intended data to the devBuf received via library interface. This already works. Only nvinfer is crashing on segfault inside output loop. as you can see from the backtrace i posted, the second nvinfer with the gie-unique-id=3 exited with success it seems.

So to answer your question, i know that nvdspreprocess does not support instance segmentation for preprocessing, but i adapted it to it so it should work. Also i tried running it with the standard custom_lib from nvidia, which does not apply masks but just creates the cropped rois scaled and padded to input size without mask application, this still crashes.

We don’t have a similar demo at the moment, so you need to debug more yourself. Since the segfault is related to the libnvds_meta, you can check to see if you are operating the metadata correctly in your nvdspreprocess.

The operating of metadata id do is writing the batch tensors into the buffer created using the buffer_acquirer provided by the cutom library inteface.

As i stated before, i also tried using the noral nvdspreprocess custom lib which also creates a segfault in nvinfer.

For debugging i also tried the following pipeline:

src → nvdspreprocess → pgie(instance seg with custom bbox parser lib) → nvidconv → osd → sink

This also is crashing with a segfault.

When removing nvdspreprocess it is working ass intended, osd is even painting instance masks, so i dont know what is messed up in my configs. Can you please have a look at them, i am not providing all this information just for fun, i really dont know how to solve this.

Let’s use this simpler pipeline to analyze your problem first. Could you attach a simple sample for this that can run normally? We can try that on our side and analyze this.

I am basically using all the files from here, it is also well documented and described how to use. I tried changing it so i am using nvdspreprocess, then it runs but the tensor_meta_data is empty when adding debug prints to nvinfer. https://github.com/marcoslucianops/DeepStream-Yolo-Seg/blob/master/deepstream_app_config.txt

I know the repo works properly. I mean, could you post your changes like below, so that we can run that on our side.

src → nvdspreprocess → pgie(instance seg with custom bbox parser lib) → nvidconv → osd → sink

As i already suspected, the configuration files were not correct.
I used the nvdspreprocess example as a base config and adapted it to my model inputs. It works now, i think the problem was related to missmatching hardware selection in the configs.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.