Dense Optical Flow error on Xavier NX with large video

Furan · July 11, 2024, 3:38pm

Hello,

I’m using JETPACK 5.1.2 and VPI 2.3 on Xavier NX. I was able to run dense optical flow on sample video (pedestrians.mp4) and a 1280 x720px custom video, with NVENC backend, level 1 pyramid, gridsize 4 and quality high.

But when I try to run it on a much larger video (2028 x1520px), I got the error : “Work item execution failed with VPI_ERROR_INTERNAL: (NVMEDIA_STATUS_ERROR)”

Is it a known issue? Should I limit the size of the image to compute the optical flow?
How can I solve it?

Thank you in advance!

EDIT :
It seems the width size of the image must be a multiple of 16 to compute the dense optical flow. I attempted to rescale the image by a factor of 0.5, resulting in dimensions of 1014 x 760 pixels, but that did not work. However, it did work with an image sized at 2032 x 1522 pixels.

AastaLLL · July 12, 2024, 7:01am

Hi,

Could you share a reproducible source with us?
We want to check it further internally.

Thanks.

Furan · July 12, 2024, 10:04am

Hello,

Here is some code that looks a lot like what I’m using to compute the dense optical flow:

#include <opencv2/core/version.hpp>
#include <opencv2/imgcodecs.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#include <opencv2/videoio.hpp>
#include <vpi/OpenCVInterop.hpp>

#include <vpi/Array.h>
#include <vpi/Image.h>
#include <vpi/ImageFormat.h>
#include <vpi/Status.h>
#include <vpi/Stream.h>
#include <vpi/algo/ConvertImageFormat.h>
#include <vpi/algo/OpticalFlowDense.h>

#include <iostream>
#include <sstream>

#define CHECK_STATUS(STMT)                                    \
    do                                                        \
    {                                                         \
        VPIStatus status = (STMT);                            \
        if (status != VPI_SUCCESS)                            \
        {                                                     \
            char buffer[VPI_MAX_STATUS_MESSAGE_LENGTH];       \
            vpiGetLastStatusMessage(buffer, sizeof(buffer));  \
            std::ostringstream ss;                            \
            ss << "line " << __LINE__ << ": ";                \
            ss << vpiStatusGetName(status) << ": " << buffer; \
            throw std::runtime_error(ss.str());               \
        }                                                     \
    } while (0);

static void ProcessMotionVector(VPIImage mvImg, cv::Mat &outputImage)
{
    // Lock the input image to access it from CPU
    VPIImageData mvData;
    CHECK_STATUS(vpiImageLockData(mvImg, VPI_LOCK_READ, VPI_IMAGE_BUFFER_HOST_PITCH_LINEAR, &mvData));

    // Create a cv::Mat that points to the input image data
    cv::Mat mvImage;
    CHECK_STATUS(vpiImageDataExportOpenCVMat(mvData, &mvImage));

    // Convert S10.5 format to float
    cv::Mat flow(mvImage.size(), CV_32FC2);
    mvImage.convertTo(flow, CV_32F, 1.0f / (1 << 5));

    // Image not needed anymore, we can unlock it.
    CHECK_STATUS(vpiImageUnlock(mvImg));

    // Create an image where the motion vector angle is
    // mapped to a color hue, and intensity is proportional
    // to vector's magnitude.
    cv::Mat magnitude, angle;
    {
        cv::Mat flowChannels[2];
        split(flow, flowChannels);
        cv::cartToPolar(flowChannels[0], flowChannels[1], magnitude, angle, true);
    }

    float clip = 5;
    cv::threshold(magnitude, magnitude, clip, clip, cv::THRESH_TRUNC);

    // build hsv image
    cv::Mat _hsv[3], hsv, bgr;
    _hsv[0] = angle;
    _hsv[1] = cv::Mat::ones(angle.size(), CV_32F);
    _hsv[2] = magnitude / clip; // intensity must vary from 0 to 1
    merge(_hsv, 3, hsv);

    cv::cvtColor(hsv, bgr, cv::COLOR_HSV2BGR);
    bgr.convertTo(outputImage, CV_8U, 255.0);
}

int main(int argc, char *argv[])
{
    // OpenCV image that will be wrapped by a VPIImage.
    // Define it here so that it's destroyed *after* wrapper is destroyed
    cv::Mat cvPrevFrame, cvCurFrame;

    // VPI objects that will be used
    VPIStream stream = NULL;
    VPIImage imgPrevFramePL = NULL;
    VPIImage imgPrevFrameTmp = NULL;
    VPIImage imgPrevFrameBL = NULL;
    VPIImage imgCurFramePL = NULL;
    VPIImage imgCurFrameTmp = NULL;
    VPIImage imgCurFrameBL = NULL;
    VPIImage imgMotionVecBL = NULL;

    VPIPayload payload = NULL;

    int retval = 0;

    try
    {
        if (argc != 3)
        {
            throw std::runtime_error(std::string("Usage: ") + argv[0] + "<input_video> <low|medium|high>");
        }

        // Parse input parameters
        std::string strInputVideo = argv[1];
        std::string strQuality = argv[2];

        VPIOpticalFlowQuality quality;
        if (strQuality == "low")
        {
            quality = VPI_OPTICAL_FLOW_QUALITY_LOW;
        }
        else if (strQuality == "medium")
        {
            quality = VPI_OPTICAL_FLOW_QUALITY_MEDIUM;
        }
        else if (strQuality == "high")
        {
            quality = VPI_OPTICAL_FLOW_QUALITY_HIGH;
        }
        else
        {
            throw std::runtime_error("Unknown quality provided");
        }

        VPIBackend backend = VPI_BACKEND_NVENC;
        int gridSize = 4;
        int numLevels = 1;

        // Load the input video
        cv::VideoCapture invid;
        if (!invid.open(strInputVideo))
        {
            throw std::runtime_error("Can't open '" + strInputVideo + "'");
        }

        // Create the stream where processing will happen. We'll use user-provided backend
        // for Optical Flow, and CUDA/VIC for image format conversions.
        CHECK_STATUS(vpiStreamCreate(backend | VPI_BACKEND_CUDA | VPI_BACKEND_VIC, &stream));

        // Fetch the first frame
        if (!invid.read(cvPrevFrame))
        {
            throw std::runtime_error("Cannot read frame from input video");
        }

        // Create the previous and current frame wrapper using the first frame. This wrapper will
        // be set to point to every new frame in the main loop.
        CHECK_STATUS(vpiImageCreateWrapperOpenCVMat(cvPrevFrame, 0, &imgPrevFramePL));
        CHECK_STATUS(vpiImageCreateWrapperOpenCVMat(cvPrevFrame, 0, &imgCurFramePL));

        // Define the image formats we'll use throughout this sample.
        VPIImageFormat imgFmt = VPI_IMAGE_FORMAT_NV12_ER;
        VPIImageFormat imgFmtBL = VPI_IMAGE_FORMAT_NV12_ER_BL;

        int32_t width = cvPrevFrame.cols;
        int32_t height = cvPrevFrame.rows;

        // Create Dense Optical Flow payload to be executed on the given backend
        std::vector<int32_t> pyrGridSize(numLevels, gridSize); // all levels will have the same grid size
        CHECK_STATUS(vpiCreateOpticalFlowDense(backend, width, height, imgFmtBL, &pyrGridSize[0], pyrGridSize.size(),
                                               quality, &payload));

        // The Dense Optical Flow on NVENC or OFA backends expects input to be in block-linear format.
        // Since Convert Image Format algorithm doesn't currently support direct BGR
        // pitch-linear (from OpenCV) to Y8 block-linear conversion, it must be done in two
        // passes, first from BGR/PL to Y8/PL using CUDA, then from Y8/PL to Y8/BL using VIC.
        // The temporary image buffer below will store the intermediate Y8/PL representation.
        CHECK_STATUS(vpiImageCreate(width, height, imgFmt, 0, &imgPrevFrameTmp));
        CHECK_STATUS(vpiImageCreate(width, height, imgFmt, 0, &imgCurFrameTmp));

        // Now create the final block-linear buffer that'll be used as input to the algorithm.
        CHECK_STATUS(vpiImageCreate(width, height, imgFmtBL, 0, &imgPrevFrameBL));
        CHECK_STATUS(vpiImageCreate(width, height, imgFmtBL, 0, &imgCurFrameBL));

        // Motion vector image width and height, align to be multiple of gridSize
        int32_t mvWidth = (width + gridSize - 1) / gridSize;
        int32_t mvHeight = (height + gridSize - 1) / gridSize;

        // The output video will be heatmap of motion vector image
        int fourcc = cv::VideoWriter::fourcc('M', 'P', 'E', 'G');
        double fps = invid.get(cv::CAP_PROP_FPS);

        cv::VideoWriter outVideo("denseoptflow_mv.mp4", fourcc, fps, cv::Size(mvWidth, mvHeight));
        if (!outVideo.isOpened())
        {
            throw std::runtime_error("Can't create output video");
        }

        // Create the output motion vector buffer
        CHECK_STATUS(vpiImageCreate(mvWidth, mvHeight, VPI_IMAGE_FORMAT_2S16_BL, 0, &imgMotionVecBL));

        // First convert the first frame to NV12_BL. It'll be used as previous frame when the algorithm is called.
        CHECK_STATUS(vpiSubmitConvertImageFormat(stream, VPI_BACKEND_CUDA, imgPrevFramePL, imgPrevFrameTmp, nullptr));
        CHECK_STATUS(vpiSubmitConvertImageFormat(stream, VPI_BACKEND_VIC, imgPrevFrameTmp, imgPrevFrameBL, nullptr));

        // Create a output image which holds the rendered motion vector image.
        cv::Mat mvOutputImage;

        // Fetch a new frame until video ends
        int idxFrame = 1;
        while (invid.read(cvCurFrame))
        {
            printf("Processing frame %d\n", idxFrame++);
            // Wrap frame into a VPIImage, reusing the existing imgCurFramePL.
            CHECK_STATUS(vpiImageSetWrappedOpenCVMat(imgCurFramePL, cvCurFrame));

            // Convert current frame to NV12_BL format
            CHECK_STATUS(vpiSubmitConvertImageFormat(stream, VPI_BACKEND_CUDA, imgCurFramePL, imgCurFrameTmp, nullptr));
            CHECK_STATUS(vpiSubmitConvertImageFormat(stream, VPI_BACKEND_VIC, imgCurFrameTmp, imgCurFrameBL, nullptr));
    

            CHECK_STATUS(vpiSubmitOpticalFlowDense(stream, backend, payload, imgPrevFrameBL, imgCurFrameBL, imgMotionVecBL));

            // Wait for processing to finish.
            CHECK_STATUS(vpiStreamSync(stream));

            // Render the resulting motion vector in the output image
            ProcessMotionVector(imgMotionVecBL, mvOutputImage);

            // Save to output video
            outVideo << mvOutputImage;

            // Swap previous frame and next frame
            std::swap(cvPrevFrame, cvCurFrame);
            std::swap(imgPrevFramePL, imgCurFramePL);
            std::swap(imgPrevFrameBL, imgCurFrameBL);
        }
    }
    catch (std::exception &e)
    {
        std::cerr << e.what() << std::endl;
        retval = 1;
    }

    // Destroy all resources used
    vpiStreamDestroy(stream);
    vpiPayloadDestroy(payload);

    vpiImageDestroy(imgPrevFramePL);
    vpiImageDestroy(imgPrevFrameTmp);
    vpiImageDestroy(imgPrevFrameBL);
    vpiImageDestroy(imgCurFramePL);
    vpiImageDestroy(imgCurFrameTmp);
    vpiImageDestroy(imgCurFrameBL);
    vpiImageDestroy(imgMotionVecBL);

    return retval;
}

I can’t provide source video. In my real program, I’m using TIF file, not mp4 as input.
But any 2028 x 1520 px video should generate the error.

Thanks!

AastaLLL · July 15, 2024, 7:37am

Hi,

Thanks.
We will give it a try and provide more info to you.

AastaLLL · August 2, 2024, 5:54am

Hi,

The constraint comes from NVENC that the input width needs to be multiples of 16.
We tested the sample with VPI 3 which uses OFA and doesn’t meet the error.

Thanks.

system · August 27, 2024, 8:07am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Xavier NX Dense Optical Flow Weird Result Jetson Xavier NX vpi	4	257	July 4, 2024
Slow Optical Flow using VPI on Xavier NX Jetson Xavier NX vpi	11	458	March 13, 2024
VPI dense optical flow not performant when in parallel with streaming output Jetson Xavier NX vpi	14	697	June 14, 2023
Why Jetson vic has a significant performance drop? Jetson Xavier NX vpi	8	45	December 19, 2024
Vpi performance benchmarking Jetson AGX Xavier vpi	5	796	April 12, 2023
OpenCV application uneven frame times Jetson Xavier NX opencv , performance , opencl	14	2829	January 19, 2022
Optical Flow resolution issue on Xavier NX DeepStream SDK deepstream	16	43	April 14, 2025
How to convert NV12_709/NV12_709_ER frame to cv::MAT on Jetson NX? DeepStream SDK opencv	10	2120	October 12, 2021
Use vpi to calculate the dense optical flow , error！ Jetson AGX Xavier vpi	2	504	April 5, 2023
VPI data transformations Jetson Orin NX vpi	5	54	June 12, 2025

Dense Optical Flow error on Xavier NX with large video

Related topics