Performance about VPI ConvertImageFormat

Hi,

My L4t info:
R35 (release), REVISION: 4.1, GCID: 33958178, BOARD: t186ref, EABI: aarch64, DATE: Tue Aug 1 19:57:35 UTC 2023
VPI: NV_VPI_VERSION_STRING “2.3.9”

Recently, I experimented with using VPI for distortion correction,and it works. But I found that most of the time was spent on color space conversion. After did some test with VPI, i have some doubt about color convert with VPI.

I read a png file(1920*1080), and test convert the BGRA format mat to VPI_IMAGE_FORMAT_NV12_ER。

I ran CUDA and VIC 10 times respectively,and it takes longer than I expected.
Please give me some help, why it is so slow?

Run with VPI_BACKEND_VIC:
截屏2024-07-01 16.37.56

Run with VPI_BACKEND_CUDA:
截屏2024-07-01 16.38.40


VPIImage vimg = nullptr;
CHECK_STATUS(vpiStreamCreate(VPI_BACKEND_CUDA, &stream));
CHECK_STATUS(vpiImageCreate(width, height, VPI_IMAGE_FORMAT_NV12_ER, 0, &tmpIn));

// run this method, loop 10 times
void testConvertImageFormat(cv::Mat &cvImage) {
    if (vimg == nullptr)
    {
        // Now create a VPIImage that wraps it.
        CHECK_STATUS(vpiImageCreateWrapperOpenCVMat(cvImage, VPI_IMAGE_FORMAT_BGRA8, 0, &vimg));
    }
    else
    {
        CHECK_STATUS(vpiImageSetWrappedOpenCVMat(vimg, cvImage));
    }

    CHECK_STATUS(vpiSubmitConvertImageFormat(stream, VPI_BACKEND_CUDA, vimg, tmpIn, NULL));
    CHECK_STATUS(vpiStreamSync(stream));

Hi,

Have you maximized the device performance before benchmarking?

Please note that the VIC clocks aren’t included in the nvpmodel and jetson_clocks.
So you will need to use the script in the below link to set a higher VIC clock.

https://docs.nvidia.com/vpi/algo_performance.html#maxout_clocks

Thanks.

I have tryed max the clock, and tested convert image format
(1080P, BGRA to VPI_IMAGE_FORMAT_NV12_ER)

VPI_BACKEND_VIC cost about 2-3ms one time;
VPI_BACKEND_CUDA cost about 1-2ms one time;

I think this is still not as fast as i expected, as most time opencv will finish convert from BGR to YUV in 1ms on the same device;

1,Is there still room for improvement?whether i should choose cuda for color format conversion for better performance ?
2,can the max mode be maintained continuously?

Hi,

Please check our performance table below:
https://docs.nvidia.com/vpi/algo_imageconv.html#algo_imageconv_perf

RGBA8 to NV12_ER is around 0.1ms with CUDA and 0.88ms with VIC.
Could you try to create the VPI images only with the backend you needed (via flag) and benchmark it again?

https://docs.nvidia.com/vpi/group__VPI__Image.html#gab2ecbae4459652c3e2ec8572860d1852

You can keep the device in the max mode continuously.

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.