NvBufSurfTransform produces wrong result in GPU compute mode

sarperyurttas36 · February 6, 2024, 9:55am

Hello Everyone! I’ve been facing an issue for a while and couldn’t find a solution for it I think it is a bug inside Nvidia libraries. The overall structure of the application is that I read frames from a file via a gstreamer pipeline process and push them to another gstreamer pipeline to encode them into a file. Pipelines are like this:

Input pipeline: filesrc -> qtdemux -> h264parse -> nvv4l2decoder -> nvvidconv (video/x-raw(memory:NVMM), format=RGBA) -> appsink
Output pipeline: appsrc (video/x-raw(memory:NVMM), format=RGBA) -> nvvidconv -> (video/x-raw(memory:NVMM), format=NV12) -> nvv4l2h264enc -> h264parse -> qtmux -> filesrc

Execution block is like this:

void executeThread()
{
    NvBufSurfTransformConfigParams config_params;
    config_params.compute_mode = NvBufSurfTransformCompute_GPU;
    config_params.gpu_id = 0;
    config_params.cuda_stream = m_cuda_stream;

    if (NvBufSurfTransformSetSessionParams(&config_params) != NvBufSurfTransformError_Success)
    {
        LOG_ERROR("NvBufSurfTransform set session failed");
        exit(-1);
    }

    while (true)
    {
        std::unique_lock<std::mutex> lock(m_execute_mutex);
        m_execute_cond.wait(lock);

        NvBufSurface *in_surface = NULL;

        NvBufSurfTransformParams transform_params;
        memset(&transform_params, 0, sizeof(NvBufSurfTransformParams));

        GstElement *input_pipeline = getPipeline(getIntputPipelineString(m_options).c_str());
        GstElement *output_pipeline = getPipeline(getOutputPipelineString(m_options).c_str());

        GstElement *appsink = gst_bin_get_by_name(GST_BIN(input_pipeline), "appsink0");
        GstElement *appsrc = gst_bin_get_by_name(GST_BIN(output_pipeline), "appsrc0");

        if (gst_element_set_state(input_pipeline, GST_STATE_PLAYING) == GST_STATE_CHANGE_FAILURE)
        {
            LOG_ERROR("Input Pipeline is unable to be set playing!");
            break;
        }
        if (gst_element_set_state(output_pipeline, GST_STATE_PLAYING) == GST_STATE_CHANGE_FAILURE)
        {
            LOG_ERROR("Output Pipeline is unable to be set playing!");
            break;
        }

        GstFlowReturn ret;
        GstBufferPool *pool = getBufferPool(m_options);

        GstBuffer *output_buffer = NULL;
        GstMemory *mem = NULL;

        int frame_id = 0;
        bool error = false;

        while (!error)
        {
            GstSample *sample = gst_app_sink_pull_sample(GST_APP_SINK(appsink));
            if (!sample)
                break;
            GstBuffer *input_buffer = gst_sample_get_buffer(sample);

            GstMapInfo inmap;
            if (!gst_buffer_map(input_buffer, &inmap, GST_MAP_READ))
                break;
            in_surface = (NvBufSurface *)inmap.data;

            if (gst_buffer_pool_acquire_buffer(pool, &output_buffer, NULL) != GST_FLOW_OK)
            {
                LOG_ERROR("Buffer cannot be acquired from pool.");
                break;
            }
            if (!gst_buffer_peek_memory(output_buffer, 0))
            {
                LOG_ERROR("Memory block of the buffer is not available.");
                break;
            }

            GstMapInfo outmap;
            if (!gst_buffer_map(output_buffer, &outmap, GST_MAP_WRITE))
                break;
            NvBufSurface *surf = (NvBufSurface *)outmap.data;

            if (NvBufSurfTransform(in_surface, surf, &transform_params) != 0)
            {
                LOG_ERROR("NvBufSurfTransform Failed");
                error = true;
                break;
            }
            gst_buffer_unmap(output_buffer, &outmap);
            gst_buffer_unmap(input_buffer, &inmap);
            gst_sample_unref(sample);

            GST_BUFFER_PTS(output_buffer) = input_buffer->pts;
            g_signal_emit_by_name(appsrc, "push-buffer", output_buffer, &ret);
            gst_buffer_unref(output_buffer);
            frame_id++;
        }

        /* Clean up */
        gst_object_unref(appsink);
        gst_object_unref(appsrc);

        gst_buffer_pool_set_active(pool, FALSE);
        gst_object_unref(pool);

        cleanUpPipeline(input_pipeline);
        cleanUpPipeline(output_pipeline);

        {
            std::unique_lock<std::mutex> lock(m_is_running_mutex);
            m_is_running = false;
        }
        LOG_INFO("Execution finished.");
    }
}

If I remove the part setting NvBufSurfTransformConfigParams to NvBufSurfTransformCompute_GPU it works properly since it uses VIC to copy in_surface to surf(output buffer) however I need to do CUDA operations (please refer to gst-nvdewarper, I will warp frames using NVWarp API)

The issue doesn’t happen always. It produces correct results sometimes

ORIN NX 16 GB - Auvidea JNX42 Carrier board - Jetpack 5.1.1 - L4T 35.3.1

I’m attaching example videos:
video0.mp4: frames copied by VIC
video1.mp4: frames copied by GPU and error type 1
video2.mp4: frames copied by GPU and error type 2
videos.zip (5.2 MB)

If there is no fix for this, is there a way to copy the contents of a memory allocated by cudaMalloc to an NvBufSurface with mem_type NVBUF_MEM_SURFACE_ARRAY

DaneLLL · February 7, 2024, 3:52am

Hi,
Please use VIC engine for the conversion. It is hardware converter in Jetson chip and we generally use it on Jetson platforms. For copying data from GPU buffer to NvBufSurface, there are NvBufSurface and CUDA APIs to map the plane(s) to GPU. You can map it to GPU and copy data through cudaMemcpy()

sarperyurttas36 · February 7, 2024, 11:50am

Hi @DaneLLL I implemented the same approach as gst-nvdewarper, but it is not recommended apparently. I tried a couple of things but couldn’t find the solution, is there an example of copying data from a block allocated with cudaMalloc to NvBufSurface?

I tried this:
Memory allocation:

if (cudaMalloc(&dewarp_params.surface, dewarp_params.dst_pitch * dewarp_params.dst_height) != cudaSuccess)
{
    LOG_ERROR("Cuda malloc failed!");
    exit(-1);
}

Copying it to NvBufSurface:

GstMapInfo outmap;
if (!gst_buffer_map(output_buffer, &outmap, GST_MAP_WRITE))
    break;
NvBufSurface *surf = (NvBufSurface *)outmap.data;
surf->numFilled = 1;

int width = surf->surfaceList[0].width;
int height = surf->surfaceList[0].height;
int pitch = surf->surfaceList[0].pitch;

NvBufSurfaceMap(surf, 0, 0, NVBUF_MAP_WRITE);
cudaError_t err = cudaMemcpy2D(
    surf->surfaceList[0].mappedAddr.addr[0],
    pitch,
    dewarp_params.surface,
    dewarp_params.dst_pitch,
    width,
    height,
    cudaMemcpyDefault);
if (err != cudaSuccess)
{
    LOG_ERROR("ERROR cudaMemcpy2D: {0}", err);
}
cudaDeviceSynchronize();
NvBufSurfaceSyncForDevice(surf, 0, 0);

I’m getting this result:

pitch, width, and height values are equal in both dewarp_params struct and NvBufSurface. What am I doing wrong?

PS: MAXN and jetson_clocks are both enabled.

EDIT 1: Eventually issue is fixed by multiplying width by 4(bytes per pixel I guess) before cudaMemcpy. Unfortunately, cudaMemcpy is 2 times slower than NvBufSurfTransform on average, is it expected behavior or am I doing something wrong?

DaneLLL · March 13, 2024, 3:27am

Hi,
Please try Jetpack 5.1.3 and check if the issue is still present. If it exists still, would need your help to share test steps. So that we can set up and try.

sarperyurttas36 · March 13, 2024, 8:04am

I solved the issue by changing NvBufSurfTransform to cudaMemcpy2d. Currently, I’m not able to try Jetpack 5.1.3 so I cannot confirm the problem in NvBufSurfTransform remains.

system · March 27, 2024, 10:59am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
NvBufSurfTransform failed with error -3 while converting buffer DeepStream SDK	12	2350	October 5, 2021
NvBufSurfTransform call is slow when copying GPU surface DeepStream SDK cuda	3	643	December 28, 2021
NvBufSurfTransform is very slow the first time it is executed on GPU Jetson AGX Orin	2	84	February 20, 2025
NvBufSurfTransform failed with error -3 while converting buffer DeepStream SDK	2	349	October 11, 2023
Deepstream plugin to crop frame DeepStream SDK jetson , deepstream	6	198	September 30, 2024
OpenCV Mat to NvBufSurface (to use in NvBufSurfTransform) DeepStream SDK	16	3834	October 12, 2021
NvBufSurfTransform error DeepStream SDK	3	1496	October 12, 2021
How can I create new NvBufSurface from cv::cuda::GpuMat with NVBUF_MEM_SURFACE_ARRAY DeepStream SDK deepstream	15	300	May 23, 2025
Populate NvBufSurface from cv::Mat DeepStream SDK	11	3463	October 12, 2021
NVBufSurface and GPU Memory zero Copy Conversion Jetson Orin NX cuda , mmapi , jetson	9	347	August 21, 2024

NvBufSurfTransform produces wrong result in GPU compute mode

Related topics