Hi bhargavK, I think Shane was suggesting to isolate the new undistort() function from the detectNet as to measure the performance impact of the camera pre-processing. While you are working on adding this, you may want to make the gst-camera test program work first. It is similar to detectnet-camera in structure, but without the detectnet, so you can confirm the de-warping is working as expected and gauge any performance impact.

If you deem the cv undistort() function to not have adequate performance, VisionWorks and NVIDIA NPP library have optimized warping functions.

If you can express distortion as the inverse of an affine matrix, these libraries may be useful.

see: http://docs.nvidia.com/cuda/pdf/NPP_Library_Image_Geometry.pdf

Here’s also a CUDA kernel that does intrinsic warping (in the style of MATLAB camera calibration toolbox)

```
__global__ void gpuIntrinsicWarp( uchar4* input, uchar4* output, int alignedWidth, int width, int height,
float2 focalLength, float2 principalPoint, float k1, float k2, float p1, float p2)
{
const int2 uv_out = make_int2(blockDim.x * blockIdx.x + threadIdx.x,
blockDim.y * blockIdx.y + threadIdx.y);
if( uv_out.x >= width || uv_out.y >= height )
return;
const float u = uv_out.x;
const float v = uv_out.y;
const float _fx = 1.0f / focalLength.x;
const float _fy = 1.0f / focalLength.y;
const float y = (v - principalPoint.y)*_fy;
const float y2 = y*y;
const float _2p1y = 2.0*p1*y;
const float _3p1y2 = 3.0*p1*y2;
const float p2y2 = p2*y2;
const float x = (u - principalPoint.x)*_fx;
const float x2 = x*x;
const float r2 = x2 + y2;
const float d = 1.0 + (k1 + k2*r2)*r2;
const float _u = focalLength.x*(x*(d + _2p1y) + p2y2 + (3.0*p2)*x2) + principalPoint.x;
const float _v = focalLength.y*(y*(d + (2.0*p2)*x) + _3p1y2 + p1*x2) + principalPoint.y;
const int2 uv_in = make_int2( _u, _v );
if( uv_in.x >= width || uv_in.y >= height || uv_in.x < 0 || uv_in.y < 0 )
return;
// printf("%i %i => %i %i\n", uv_out.x, uv_out.y, uv_in.x, uv_in.y);
output[uv_out.y * alignedWidth + uv_out.x] = input[uv_in.y * alignedWidth + uv_in.x];
}
cudaError_t cudaWarpIntrinsic( const float2& focalLength, const float2& principalPoint, const float4& distortion,
uchar4* input, uchar4* output, const dim3& size )
{
if( !input || !output )
return cudaErrorInvalidDevicePointer;
if( size.x == 0 || size.y == 0 || size.z == 0 )
return cudaErrorInvalidValue;
const int alignedWidth = size.z / sizeof(uchar4);
// launch kernel
const dim3 blockDim(16, 16);
const dim3 gridDim(iDivUp(size.x,blockDim.x), iDivUp(size.y,blockDim.y));
gpuIntrinsicWarp<<<gridDim, blockDim>>>(input, output, alignedWidth, size.x, size.y,
focalLength, principalPoint,
distortion.x, distortion.y, distortion.z, distortion.w);
return CUDA(cudaGetLastError());
}
```

There seems to be overlap with the cv calibration parameters so perhaps you can still use that for calibration.

In this kernel the first four parameters from the MATLAB distortion coefficients are used (k1, k2, p1, p2)

See here about the parameters http://www.vision.caltech.edu/bouguetj/calib_doc/htmls/parameters.html