I am programming an application in which i have to rotate an image by a set angle and then display the image. Following is the logic of the code I am using:

  1. Allocate memory for data (image) in the device.

  2. CudaMemcpy data to the device .

  3. Call nppirotate function with the below parameters.

     //allocate memory for input image in device
     cudaMalloc((void**)&outImage, rows * cols * sizeof(unsigned char));
     cudaMalloc((void**)&dev_im1, rows * cols * sizeof(unsigned char));
     cudaMalloc((void**)&dev_im_rotate, rows * cols * sizeof(unsigned char));
     cudaMalloc((void**)&dev_im_rotate_host, rows * cols * sizeof(unsigned char));
     int angle = 30;
     //copy images from host to device
     cudaError_t errr = cudaMemcpy(dev_im1, inImage, rows * cols * sizeof(unsigned char), cudaMemcpyHostToDevice);
     cout << "Error is" << cudaGetErrorString(errr) << errr <<  endl;
     NppiSize size = {800,600};
     	NppiRect rect_shift = {0,0,850,650};
     	NppiRect rect = {0,0,800,600};
     	NppStatus status = nppiRotate_8u_C1R ((Npp8u*) dev_im1, size,
     			sizeof(Npp8u) * cols, rect,
             	 (Npp8u*) dev_im_rotate, sizeof(Npp8u) * cols, rect_shift,
     	         180*angle/M_PI, 0, 0, NPPI_INTER_LINEAR);
     	cout << "error is" << status << endl;

The output I am getting is error is-14. However, I am unsure about the cause of the same. Also, I want to use bilinear for interpolating the pixels, but I dint find an option in the library to use bilinear. Will it make a difference if I use Linear Interpolation.

The size of my input image is 800 by 600 pixels and grayscale. The error is a NPP_MEMSET_ERR.


Bilinear interpolation is linear interpolation but apply to both x-axis and y-axis.
You can find all the supported interpolation algorithm here:

For MEMSET issue, could you try to access the dev_im1 buffer with NPP API to see if any error first?
It should be similar to this:

// Allocate the device memroy.
cudaMalloc((void **)(&pSrc), sizeof(Npp32f) * nLength);
nppsSet_32f(1.0f, pSrc, nLength);  
cudaMalloc((void **)(&pSum), sizeof(Npp32f) * 1);


