Time taken in cloning cvOut from vpiImageDataExportOpenCVMat(input_data, cvOut) is too high

// cfm is an input image of type cv::Mat
// lut is a cv::Mat, using which we are generating VPIWarpMap
// (out_rows and out_cols) is output image size 
vpiStreamCreate(VPI_BACKEND_CUDA, &stream_yash);
vpiImageCreateOpenCVMatWrapper(cfm, 0, &image_yash);
int32_t w,h;
vpiImageGetSize(image_yash, &w,&h);
VPIImageFormat format;
vpiImageCreate(out_cols, out_rows, type, 0, &output_yash);
VPIWarpMap lut_vpi = cv_to_vpi_map(lut);
vpiCreateRemap(VPI_BACKEND_CUDA, &lut_vpi, &warp_yash);
vpiSubmitRemap(stream_yash, VPI_BACKEND_CUDA ,warp_yash, image_yash, output_yash, VPI_INTERP_NEAREST, VPI_BORDER_ZERO,0);
VPIImageData outData;
vpiImageLock(output_yash, VPI_LOCK_READ, &outData);
vpiImageDataExportOpenCVMat(outData, &cvOut);
cv::Mat out = cvOut.clone(); // (issue)-> this is taking a lot of time
return out;

The output image is absolutely is fine. The problem is that accessing the output image is taking a lot of time.

Its taking around 150ms to clone cvOut (which is too high). Output image size is (5290 x 3638). Idealy is should take around 10ms.

We want to use vpi because here remap time is less here. But accessing the output from vpiRemap is taking long.

Even if not clone(), any operation performed on cvOut is taking too long.

What’s the purpose of using this if accessing output is taking too long.

This is a issue with vpi or we are doing it the wrong way.


Do you observe the issue with standard resolution such as 3840x2160? 5290 x 3638 is a very large solution. Standard 4K is 3840x2160. Would like to know if this resolution works fine.

And please share your release version( $ head -1 /etc/nv_tegra_release ).

Time taken by (3544 x 2436)is around 40ms, which itself is high.
I am trying it on 2 versions

  1. R32 (release), REVISION: 4.4, GCID: 23942405, BOARD: t186ref, EABI: aarch64, DATE: Fri Oct 16 19:37:08 UTC 2020

  2. R32 (release), REVISION: 5.1, GCID: 27362550, BOARD: t186ref, EABI: aarch64, DATE: Wed May 19 18:16:00 UTC 2021

Its taking same time on both versions.

I know that 5290 x 3638 is a very large resolution, but it shouldn’t take that much time, right?

Please execute sudo nvpmodel -m 0 and sudo jetson_clocks, and try again. The data is in cvOut after calling

vpiImageDataExportOpenCVMat(outData, &cvOut);

And cvOut.clone() copies the data to cv::Mat out through CPU. NVPModel mode 2 enables 6 CPU cores at fixed 1.4GHz. It should be max CPU throughput of Xavier NX. Please try the mode.

Thanks for replying
Tried the above commands
Time is still almost the same
(5290 x 3638)135ms
(3544 x 2436) 35ms
Is this the best that we can get? If yes then its too high!

Any other way through which we can reduce time?

We were using 6 CPU cores before also.

Just so that you know,
We are cloning because:

  1. output which we are getting by applying operations on without cloned image is wrong.
  2. if we don’t do clone, then the time of successive calls also increases.

Our understanding:
whenever we are trying the access the data in cvOut, its taking long than expected.