Segfault using NvdsPreprocess custom TensorPreparation

johannesrhvw · March 27, 2025, 2:29pm

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU): Laptop RTX 4070
• DeepStream Version: 7.1
• TensorRT Version: 10.3 using ngc deepstream container
• NVIDIA GPU Driver Version (valid for GPU only): 570
• Issue Type( questions, new requirements, bugs): Question
• How to reproduce the issue ?
I am preparing tensors for secondary inference using masks from previous instance segmentation to remove background for detected objects. This is done using two self written Cuda Kernels, one Scaling the mask to the correct dimensions so it can be applied to unit->converted_frame_buf which is then applied using a edited version of the Conversion Kernels provided in nvdspreprocess_conversion.cu taking the masks value in account for each pixel. The data is passt the exact same way as done in example library and also converted the exact same way.

Here is my question:
Is the CustomAsyncTransformation as in the example library always needed? As far as i understand this creates a Tensor from GstBuffer Datastructure and needs to be implemented in every custom preprocess library

Second How should nvdspreprocess be configured for my pipeline
src->nvstreammux->pgie->nvdspreprocess->sgie
So that everything matches: expected batch size, format, using meta-data-tensor as input etc. Using the examples did not give me a good understanding, some say if nvdspreprocess is present sgie has to be configured for primary inference when input-from-meta-data=1 is set. How do batchsizes need to be specified, my suspicion is that my sgie is expecting a full batch, and exits with segfault when said batchsize is not given as input.

backtrace of segfault:

Thread 23 "python3" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff9adde640 (LWP 908439)]
0x00007ffff7ce93fe in free () from /usr/lib/x86_64-linux-gnu/libc.so.6
(gdb) bt full
#0  0x00007ffff7ce93fe in free () at /usr/lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff716d87f in release_obj_meta () at /opt/nvidia/deepstream/deepstream/lib/libnvds_meta.so
#2  0x00007ffff716d65b in nvds_clear_meta_list () at /opt/nvidia/deepstream/deepstream/lib/libnvds_meta.so
#3  0x00007ffff716d6f5 in release_frame_meta () at /opt/nvidia/deepstream/deepstream/lib/libnvds_meta.so
#4  0x00007ffff716cffd in nvds_destroy_meta_pool () at /opt/nvidia/deepstream/deepstream/lib/libnvds_meta.so
#5  0x00007ffff716bda5 in nvds_destroy_batch_meta () at /opt/nvidia/deepstream/deepstream/lib/libnvds_meta.so
#6  0x00007ffff6f38139 in  () at /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0
#7  0x00007ffff51e1acf in  () at /usr/lib/x86_64-linux-gnu/libgstbase-1.0.so.0
#8  0x00007ffff51e114c in  () at /usr/lib/x86_64-linux-gnu/libgstbase-1.0.so.0
#9  0x00007ffff6f7986d in  () at /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0
#10 0x00007ffff6f7ce09 in  () at /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0
#11 0x00007ffff6f7d22e in gst_pad_push () at /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0
#12 0x00007fffeb3cf8d3 in  () at /usr/lib/x86_64-linux-gnu/gstreamer-1.0/deepstream/libnvdsgst_infer.so
#13 0x00007ffff773bac1 in g_thread_proxy (data=0x555559540550) at ../glib/gthread.c:831
        thread = 0x555559540550
        __func__ = "g_thread_proxy"
#14 0x00007ffff7cd8ac3 in  () at /usr/lib/x86_64-linux-gnu/libc.so.6
#15 0x00007ffff7d6a850 in  () at /usr/lib/x86_64-linux-gnu/libc.so.6

This is my tensorpreparation function:

    NvDsPreProcessStatus CustomTensorPreparation(CustomCtx *ctx, NvDsPreProcessBatch *batch, NvDsPreProcessCustomBuf *&buf, CustomTensorParams &tensorParam, NvDsPreProcessAcquirer *acquirer)
    {
        NvDsPreProcessStatus status = NVDSPREPROCESS_TENSOR_NOT_READY;
        std::cout << "=== CustomTensorPreparation STARTED ===" << std::endl;
        std::cout << "Batch size: " << (batch ? batch->units.size() : 0) << std::endl;
        std::cout << "TensorParam network color format: " << tensorParam.params.network_color_format << std::endl;

        // 2. Acquire a buffer from the tensor pool
        buf = acquirer->acquire();
        if (!buf || !buf->memory_ptr)
        {
            std::cerr<< "ERROR: Failed to acquire buffer from tensor pool or null memory pointer"<<std::endl;
            return NVDSPREPROCESS_RESOURCE_ERROR;
        }

        std::cout << "Buffer acquired successfully" << std::endl;

        // 3. Cutout objects with error handling
        std::cout << "Calling cutout_objects..." << std::endl;
        status = ctx->cutout_impl->cutout_objects(batch, buf->memory_ptr, tensorParam);
        std::cout << "cutout_objects returned status: " << status << std::endl;

        if (status != NVDSPREPROCESS_SUCCESS)
        {
            std::cerr<< "ERROR: CustomTensorPreparation: cutout_objects failed error code "<<status<<std::endl;
            acquirer->release(buf);
            return status;
        }

        // 4. Sync CUDA stream with error handling
        std::cout << "Syncing CUDA stream..." << std::endl;
        status = ctx->cutout_impl->syncStream();
        std::cout << "syncStream returned status: " << status << std::endl;

        if (status != NVDSPREPROCESS_SUCCESS)
        {
            std::cerr<< "ERROR: CustomTensorPreparation: syncStream failed error code "<<status<<std::endl;
            acquirer->release(buf);
            return status;
        }

        tensorParam.params.network_input_shape[0] = (int)batch->units.size();

        std::cout << "=== CustomTensorPreparation COMPLETED SUCCESSFULLY ===" << std::endl;
        return status;
    }

The cutout_objects function:

NvDsPreProcessStatus CustomObjectCutoutImpl::cutout_objects(
    NvDsPreProcessBatch *batch, void *&devBuf, CustomTensorParams &tensorParam)
{
    cudaError_t err = cudaSuccess;

    if (!batch || !devBuf)
    {
        std::cerr << "Invalid input parameters" << std::endl;
        return NVDSPREPROCESS_CUSTOM_LIB_FAILED;
    }

    unsigned int batch_size = batch->units.size();
    if (batch_size > m_BatchSize)
    {

        std::cerr << "Batch size exceeds allocated resources" << std::endl;
        return NVDSPREPROCESS_CUSTOM_LIB_FAILED;
    }

    for (unsigned int i = 0; i < batch_size; i++)
    {
        NvDsPreProcessUnit *unit = &batch->units[i];
        if (!unit || !unit->roi_meta.object_meta)
        {
            std::cerr << "Invalid unit or object metadata" << std::endl;
            continue;
        }

        // original rect_roi
        NvDsRoiMeta * roi_meta = &unit->roi_meta;
        NvOSD_RectParams * roi = &roi_meta->roi;
        // 320x320 mask for nvinfer input 1280x1280
        NvOSD_MaskParams *mask_params = &unit->roi_meta.object_meta->mask_params;
        // Before calling TransformMask, add verification:
        if (!mask_params || !mask_params->data)
        {
            std::cerr<< "ERROR: Invalid mask_params or mask data"<<std::endl;
            return NVDSPREPROCESS_CUDA_ERROR;
        }
        if (mask_params->width != m_MaskWidth || mask_params->height != m_MaskHeight)
        {
            std::cerr<< "ERROR: Invalid mask dimensions"<<std::endl;
            return NVDSPREPROCESS_CUDA_ERROR;
        }
        std::size_t mask_size = m_MaskWidth * m_MaskHeight * sizeof(float);
        float *mask = ((float *)m_Mask.get() ? m_Mask->ptr<float>() : nullptr);
        if (!mask)
        {
            std::cerr<< "ERROR: Invalid mask pointer"<<std::endl;
            return NVDSPREPROCESS_CUDA_ERROR;
        }
        float* d_mask_data = mask + i * m_MaskWidth * m_MaskHeight;
        std::cout << "Copying mask data to device" << std::endl;
        err = cudaMemcpyAsync(d_mask_data, mask_params->data, mask_size,
                              cudaMemcpyHostToDevice, *m_PreProcessStream);
        if (err != cudaSuccess) {
            std::cerr<< "ERROR: CustomObjectCutoutImpl: cutout_objects: Failed to copy mask data to device: "<<cudaGetErrorString(err)<<std::endl;
            return NVDSPREPROCESS_CUDA_ERROR;
        }

        float *d_scaled_mask_data = ((float *)m_ScaledMask.get() ? m_ScaledMask->ptr<float>() : nullptr) + i * m_NetworkSize.width * m_NetworkSize.height;
        if (!d_scaled_mask_data) {
            std::cerr<< "ERROR: Invalid scaled mask pointer"<<std::endl;
            return NVDSPREPROCESS_CUDA_ERROR;
        }
        // shape of mask input
        unsigned int in_mask_width = mask_params->width; // 160
        unsigned int in_mask_height = mask_params->height; // 160
        // shape of desired output
        unsigned int out_width = m_NetworkSize.width; // 640
        unsigned int out_height = m_NetworkSize.height; // 640
        // calculate output roi to scale in roi to
        float roi_crop_scale_x = roi_meta->scale_ratio_x;
        float roi_crop_scale_y = roi_meta->scale_ratio_y;
        unsigned int roi_out_width = (unsigned int)((float)roi->width * (float)roi_crop_scale_x);
        unsigned int roi_out_height = (unsigned int)((float)roi->height * (float)roi_crop_scale_y);
        if (roi_out_width > out_width)
        {
            roi_out_width = out_width;
        }
        if (roi_out_height > out_height)
        {
            roi_out_height = out_height;
        }
        unsigned int roi_out_left = roi_meta->offset_left;
        unsigned int roi_out_top = roi_meta->offset_top;
        // scale ratio needed for scaling from in shape to out shape
        float scale_ratio_x = float(in_mask_width) / float(roi_out_width);
        float scale_ratio_y = float(in_mask_height) / float(roi_out_height);

        if (!m_ScaledMask || !m_ScaledMask->ptr() || !m_PreProcessStream || !m_PreProcessStream->ptr()) {
            std::cerr<< "ERROR: Invalid CUDA resources"<<std::endl;
            return NVDSPREPROCESS_CUDA_ERROR;
        }
        std::cout << "Transforming mask" << std::endl;
        err = TransformMask(
            d_scaled_mask_data,
            d_mask_data,
            in_mask_width,
            in_mask_height,
            out_width,
            out_height,
            roi_out_left,
            roi_out_top,
            roi_out_width,
            roi_out_height,
            scale_ratio_x,
            scale_ratio_y,
            m_PreProcessStream->ptr());
        if (err != cudaSuccess)
        {
            std::cerr<< "TransformMask failed with err "<<(int)err<<" : "<<cudaGetErrorName(err)<<std::endl;
            return NVDSPREPROCESS_CUDA_ERROR;
        }
        void *outPtr = (void*)((uint8_t*)devBuf + i * m_NetworkSize.channels
          * m_NetworkSize.width * m_NetworkSize.height
          * bytesPerElement(tensorParam.params.data_type));
        err = ApplyMaskAndConvert_C4ToL3Half((half *)outPtr, (unsigned char *)unit->converted_frame_ptr, d_scaled_mask_data, out_width, out_height, batch->pitch, *m_PreProcessStream);
        if (err != cudaSuccess)
        {
            std::cerr<< "ApplyMaskToConvertedBuffer failed with err "<<(int)err<<" : "<<cudaGetErrorName(err)<<std::endl;
            return NVDSPREPROCESS_CUDA_ERROR;
        }
        err = cudaStreamSynchronize(*m_PreProcessStream);
        if (err != cudaSuccess)
        {
            std::cerr<< "cudaStreamSynchronize failed with err "<<(int)err<<" : "<<cudaGetErrorName(err)<<std::endl;
            return NVDSPREPROCESS_CUDA_ERROR;
        }
        // Add error checking after each CUDA operation
        if (cudaGetLastError() != cudaSuccess) {
            std::cerr << "CUDA error during processing" << std::endl;
            return NVDSPREPROCESS_CUDA_ERROR;
        }
    }

    return NVDSPREPROCESS_SUCCESS;
}

I already tested this function using memcopy and saving results in devBuf to ppm files, the images are perfectly cutout, masks correctly applied. Also for multiple iterations this is at least executed correctly i cant see the results from secondary inference but the function is executed correctly for some time until segfault happens

Also for some reason the converted_frame_buf is in RGBA but nowhere this format is specified in my configs
config_preprocess_secondary.txt (2.8 KB)
config_infer_primary_yoloV8_seg.txt (2.0 KB)
config_infer_secondary_yoloV8_seg.txt (2.1 KB)

johannesrhvw · March 28, 2025, 8:30am

I Just tested my setup with the config files above with the provided nvdspreprocess custom library and the segfault still happens, so it seems the configuration must be wrong. Pipeline urlsrc → nvstreammux → pgie → nvdspreprocess → pgie ->nvvidconv → nvosd → sink

johannesrhvw · March 28, 2025, 9:35pm

I have added debugging flags to libgstnvinfer and i print the uniqueid of the nvinfer instance. The pipeline crashes inside nvinfer outputloop at call
GstFlowReturn flow_ret = gst_pad_push (GST_BASE_TRANSFORM_SRC_PAD (nvinfer), batch->inbuf);
This is the backtrace:

DBUG nvinfer 3: received batch_meta with num_frames_in_batch:1
msg from unique id nvinfer: 3
msg from unique id nvinfer: 3

Thread 23 "python3" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff6adde640 (LWP 1229644)]
0x00007ffff7ce93fe in free () from /usr/lib/x86_64-linux-gnu/libc.so.6
(gdb) bt f
#0  0x00007ffff7ce93fe in free () at /usr/lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff716d87f in release_obj_meta () at /opt/nvidia/deepstream/deepstream/lib/libnvds_meta.so
#2  0x00007ffff716d65b in nvds_clear_meta_list () at /opt/nvidia/deepstream/deepstream/lib/libnvds_meta.so
#3  0x00007ffff716d6f5 in release_frame_meta () at /opt/nvidia/deepstream/deepstream/lib/libnvds_meta.so
#4  0x00007ffff716cffd in nvds_destroy_meta_pool () at /opt/nvidia/deepstream/deepstream/lib/libnvds_meta.so
#5  0x00007ffff716bda5 in nvds_destroy_batch_meta () at /opt/nvidia/deepstream/deepstream/lib/libnvds_meta.so
#6  0x00007ffff6f38139 in  () at /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0
#7  0x00007ffff51e1acf in  () at /usr/lib/x86_64-linux-gnu/libgstbase-1.0.so.0
#8  0x00007ffff51e114c in  () at /usr/lib/x86_64-linux-gnu/libgstbase-1.0.so.0
#9  0x00007ffff6f7986d in  () at /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0
#10 0x00007ffff6f7ce09 in  () at /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0
#11 0x00007ffff6f7d22e in gst_pad_push () at /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0
#12 0x00007fffeb39ac99 in gst_nvinfer_output_loop(gpointer) (data=0x55555644ad10) at gstnvinfer.cpp:2418
        flow_ret = GST_FLOW_OK
        batch_output = 0x0
        nvdsinfer_ctx = std::shared_ptr<INvDsInferContext> (use count 2, weak count 0) = {
          get() = 0x55555707b460
        }
        batch = std::unique_ptr<GstNvInferBatch> = {
          get() = 0x7ffedc0e43c0
        }
        tensor_deleter = {<No data fields>}
        tensor_out_object = std::unique_ptr<GstNvInferTensorOutputObject> = {
          get() = 0x0
        }
        nvinfer = 0x55555644ad10 [GstNvInfer]
        impl = 0x55555644b8e0
        init_params = 0x5555565b9170
        status = NVDSINFER_SUCCESS
        eventAttrib = {
          version = 3,
          size = 48,
          category = 0,
          colorType = 1,
          color = 4284907198,
          payloadType = 0,
          reserved0 = 0,
          payload = {
            ullValue = 0,
            llValue = 0,
            dValue = 0,
            uiValue = 0,
            iValue = 0,
            fValue = 0
          },
          messageType = 1,
          message = {
            ascii = 0x7fff5c02e5d0 "dequeueOutputAndAttachMeta batch_num=107",
            unicode = 0x7fff5c02e5d0 L"\x75716564\x4f657565\x75707475\x646e4174\x61747441\x654d6863\x62206174\x68637461\x6d756e5f\x3730313d",
            registered = 0x7fff5c02e5d0
          }
        }
        nvtx_str = "dequeueOutputAndAttachMeta batch_num=107"
        cudaReturn = cudaSuccess
        uniqueid = 3
        __FUNCTION__ = "gst_nvinfer_output_loop"
        locker = {
          m = @0x55555644af70,
          locked = false
        }
#13 0x00007ffff773bac1 in g_thread_proxy (data=0x555559559150) at ../glib/gthread.c:831
        thread = 0x555559559150
        __func__ = "g_thread_proxy"
#14 0x00007ffff7cd8ac3 in  () at /usr/lib/x86_64-linux-gnu/libc.so.6
#15 0x00007ffff7d6a850 in  () at /usr/lib/x86_64-linux-gnu/libc.so.6
(gdb)

I believe something is still wrong with my configuration but after multiple days of trying different configs i cant get it right. By the way, i am using two YoloV8 Segmentation models After each other.

yuweiw · March 30, 2025, 8:56am

So is your pgie a segmentation model? We cannot support this scenairo. you can refer to our Guide.

Preprocess with SGIE mode is used to process the detected objects within the given ROI/Frame on which we want to perform secondary inferencing.

johannesrhvw · March 30, 2025, 10:05am

yes, my pipeline is as following:
src → nvstreammux → nvinfer(primary mode: instance segmentation) → nvdspreprocess(custom library, receiving cropped objects scaled and padded to input size, my custom lib is only applying the mask additionally) → nvinfer(primary mode?: does detection on the cutouts, that should be saved inside the NvDsBatchMeta attached by nvdspreprocess) → nvvidconv → osd → displaysink

As is said the custom lib i tested, it is working and writing the intended data to the devBuf received via library interface. This already works. Only nvinfer is crashing on segfault inside output loop. as you can see from the backtrace i posted, the second nvinfer with the gie-unique-id=3 exited with success it seems.

So to answer your question, i know that nvdspreprocess does not support instance segmentation for preprocessing, but i adapted it to it so it should work. Also i tried running it with the standard custom_lib from nvidia, which does not apply masks but just creates the cropped rois scaled and padded to input size without mask application, this still crashes.

yuweiw · March 31, 2025, 8:03am

We don’t have a similar demo at the moment, so you need to debug more yourself. Since the segfault is related to the libnvds_meta, you can check to see if you are operating the metadata correctly in your nvdspreprocess.

johannesrhvw · March 31, 2025, 8:40am

The operating of metadata id do is writing the batch tensors into the buffer created using the buffer_acquirer provided by the cutom library inteface.

As i stated before, i also tried using the noral nvdspreprocess custom lib which also creates a segfault in nvinfer.

For debugging i also tried the following pipeline:

src → nvdspreprocess → pgie(instance seg with custom bbox parser lib) → nvidconv → osd → sink

This also is crashing with a segfault.

When removing nvdspreprocess it is working ass intended, osd is even painting instance masks, so i dont know what is messed up in my configs. Can you please have a look at them, i am not providing all this information just for fun, i really dont know how to solve this.

yuweiw · March 31, 2025, 10:56am

Let’s use this simpler pipeline to analyze your problem first. Could you attach a simple sample for this that can run normally? We can try that on our side and analyze this.

johannesrhvw · March 31, 2025, 12:15pm

I am basically using all the files from here, it is also well documented and described how to use. I tried changing it so i am using nvdspreprocess, then it runs but the tensor_meta_data is empty when adding debug prints to nvinfer. https://github.com/marcoslucianops/DeepStream-Yolo-Seg/blob/master/deepstream_app_config.txt

yuweiw · April 1, 2025, 2:37am

I know the repo works properly. I mean, could you post your changes like below, so that we can run that on our side.

src → nvdspreprocess → pgie(instance seg with custom bbox parser lib) → nvidconv → osd → sink

johannesrhvw · April 14, 2025, 6:01am

As i already suspected, the configuration files were not correct.
I used the nvdspreprocess example as a base config and adapted it to my model inputs. It works now, i think the problem was related to missmatching hardware selection in the configs.

system · April 28, 2025, 6:01am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Adding NvInfer to my pipeline is causing segfault DeepStream SDK	5	1261	October 12, 2021
Semantic segmetation nvdspreprocessing+invinfer DeepStream SDK	5	925	September 21, 2022
Segmentation fault when using nvdspreprocess DeepStream SDK deepstream	25	192	January 3, 2025
Program terminated with signal SIGSEGV when put nvinfer between nvstreammux and nvstreamdemux DeepStream SDK	1	1208	September 12, 2018
Pgie not running with nvdspreprocess on deepstream 6.0 application DeepStream SDK deepstream	25	1279	August 4, 2023
Cascaded Instance Segmentation for defect analysis DeepStream SDK tensorrt , ubuntu , gstreamer , deepstream	17	219	October 30, 2024
Extract preprocessed tensors from nvpreprocess DeepStream SDK	9	1810	August 1, 2022
Preprocessing of frames - gst-dsexample DeepStream SDK	6	1606	October 12, 2021
Segmentation Error DeepStream SDK	6	2773	October 29, 2018
How to custom preprocess in SGIE base on Deepstream 5.0? DeepStream SDK	15	3481	October 12, 2021

Segfault using NvdsPreprocess custom TensorPreparation

Related topics