NPP WarpAffine NPP_WRONG_INTERSECTION_QUAD_WARNING

I am trying to use NPP warpaffine to perform some image transformation. But for some of the coefficients I keep getting the NPP_WRONG_INTERSECTION_QUAD_WARNING error. The same coefficients works fine in OpenCV and I calculated the quads and it obviously has intersection with dst ROI. The errors happens in all the CUDA releases I tried ( 6.5, 7.0 and 7.5). Below is some simple code to demonstrate the problem. Any help will be appreciated.

#include <cuda_runtime.h>
#include <npp.h>
#include <stdio.h>

int main()
{    
    const int width = 200, height = 170;
    const int out_width = 64, out_height = 64;

    unsigned char  *dSrc, *dDst;

    cudaMalloc<unsigned char>(&dSrc,3*width*height*sizeof(Npp32f));
    cudaMalloc<unsigned char>(&dDst,3*out_width*out_height*sizeof(Npp32f));

    NppiSize srcSize = {width, height};
    NppiSize dstSize = {out_width, out_height};
    NppiRect srcRoi = {0,0,width, height};
    NppiRect dstRoi = {0,0,out_width, out_height};

    double coeffs[2][3];

    coeffs[0][0]=0.967700; 
    coeffs[0][1]=0.523475; 
    coeffs[0][2]=-90.066200; 
    coeffs[1][0]=0.444953; 
    coeffs[1][1]=-1.138470; 
    coeffs[1][2]=30.769278;

    int outImgSz = out_width*out_height;
    int srcImgSz = width*height;

    const Npp32f * pSrc[3];
    Npp32f * pDst[3];
    pSrc[0] = (Npp32f*) (dSrc);
    pSrc[1] = (Npp32f*) (pSrc[0] + srcImgSz);
    pSrc[2] = (Npp32f*) (pSrc[0] + 2*srcImgSz);
    pDst[0] = (Npp32f*) dDst;
    pDst[1] = (Npp32f*) (pDst[0] + outImgSz);
    pDst[2] = (Npp32f*) (pDst[0] + 2*outImgSz);

    int rval = nppiWarpAffine_32f_P3R (pSrc, srcSize, width*sizeof(Npp32f), srcRoi, pDst, out_width*sizeof(Npp32f), dstRoi, coeffs, NPPI_INTER_CUBIC);
    if(NPP_NO_ERROR != rval)
    {
        fprintf(stderr, "NPP error %d\n", rval);
        exit(1);
    }
    cudaFree(dSrc);
    cudaFree(dDst);

    return 0;
}

Thanks for reporting this. It appears to be an issue in the npp library, where some transforms that should be acceptable are erroneously flagged with this warning.

I don’t have a suggestion for a workaround at this time, but I would expect a fix of some sort to arrive in a future CUDA toolkit.

Did the operation work even though you got the warning?
It fails in my case (I think the system I am running on is using CUDA 8.0 –
I wonder if this has been fixed in CUDA 9.0)?

Fails in this case:
source quad =
{0.00000000000000000, 0.00000000000000000}
{255.50000000000000, 0.00000000000000000}
{255.50000000000000, 511.00000000000000}
{0.00000000000000000, 511.00000000000000}
destination quad =
{-0.77898188805845603, 14.346174839886430}
{248.50000000000000, -16.000000000000000}]
{248.50000000000000, 495.00000000000000}
{-0.77898188805845603, 464.65382516011357}

succeeds in this case:
source quad =
{255.50000000000000, 0.00000000000000000}
{511.00000000000000, 0.00000000000000000}
{511.00000000000000, 511.00000000000000}
{255.50000000000000, 511.00000000000000}
destination quad=
{248.50000000000000, -16.000000000000000}
{497.77898188805841, 14.346174839886430}
{497.77898188805841, 464.65382516011357}
{248.50000000000000, 495.00000000000000}