vxSetRemapPoint operation very slow for large remaps

Hi,

I want to fill a large remap (typically 4k image size) with the remap points. To do this you need to use the vxSetRemapPoint function. This takes around 1.5 seconds on the Jetson TX2. Is there a faster method for filling the points (e.g. by functions like vxCopyRemapPatch and vxMapRemapPatch)?

The calculation of the warp x- and y-maps (using Gpu accelerated OpenCV functions) takes around 170 ms on the Jetson TX2.

This is a code snippet for filling the remap (input_width_ = 3840, input_height_ = 2160):

void CreateRemapTable()
{
    vx_remap_ = vxCreateRemap(vx_context_, input_width_, input_height_, output_width_, output_height_);
    // Initialize the remap table.
    for (int dst_y = 0;dst_y < output_height_;dst_y++) {
        for (int dst_x = 0;dst_x < output_width_;dst_x++) {
            int src_x = dst_x + crop_left;
            int src_y = dst_y + crop_top;
            vxSetRemapPoint(vx_remap_, dst_x, dst_y, (vx_float32)x_warp_map_->at<float>(src_y, src_x), (vx_float32)y_warp_map_->at<float>(src_y, src_x));
        }
    }
}

Hi,

Could you shared more detail about your use case?

If you just want to shift the image with a constant offset, you don’t need to fill the map table each time.
This is a point level read/write command and expected to be slow.

It’s recommended to check the homogeneous transformation implementation in the visionworks:
{VisionWorks folder}/3rdparty/eigen/Eigen/src/Eigen2Support/Geometry/Transform.h

Thanks.

Hi,

The use case is a virtual camera PTZ system. So first I need to cylindrically warp the 4k camera images on a cylinder and after that warp part of the cylindrically warped camera image back on an output image. The remap matrix for this warp changes a lot since the vPTZ camera is changing its angles (and thus the rotation matrix) multiple times per second.

Regards,
Boris.

Hi,

An alternative is to use homogeneous transform.

Is it possible to approximate your warping matrix to be a 3x4 transform matrix?
Or you need a pixel level warping accuracy for your use case?

Thanks.

Hi,

No unfortunately I cannot use a homogeneous transform, since I need to stitch the 2 input camera images accurately. They’re placed under an angle of approximately 90 degrees. A cylindrical (or spherical) warp is needed to do this.

Hi,

But I think you can create these table at the beginning.
Suppose there are the limited degrees to map.

Thanks.

I am creating the tables for the 2 fixed cameras at the beginning as you are suggesting, but the actual output of the system is a “virtual” camera that can pan, tilt and zoom. During the movement of the camera (e.g. panning) the matrix for the destination warp has to change every frame (e.g. 30 times per second).

Hi,

Sorry for the late reply.

The API is not suitable for your use case.
Maybe you can try to implement a CUDA kernel for your use case.

Or try our NPP library: https://docs.nvidia.com/cuda/npp/group__image__remap.html

Thanks.