NPP_TEXTURE_BIND_ERROR error with Canny edge detector..

Sorry if I sound naive because this is my very first time on this forum.

I am getting NPP_TEXTURE_BIND_ERROR error with Canny edge detector call:

nppiCanny_32f8u_C1R(gx_device,pStepBytes_gx,gy_device,pStepBytes_gy,pDstEdges, nDstEdgeStep,SizeROI,low_threshold,high_threshold,pBuffer);

where: gx_device: x-derivative is stored on device, (determined by me on the host and transferred to device)
gy_device: y-derivative is stored on device, (determined by me on the host and transferred to device)

   pDstEdges: pointer points to an image, stored, on device, where each pixel is of type Npp8u.

   pBuffer: allocated as determined by nppiCannyGetSize(SizeROI,&hpBufferSize) and is on device;

   pStepBytes_gx, pStepBytes_gy, nDstEdgeStep: seems to have correct values as theye are atleast 4 times the size of a row in bytes.

Please help! I have no idea why I am getting this error.

Also, if required I will be willing to buy support for this. I could not find any link/place on the web site to do so.




have you fix your issue?? I have same error for all nppi filtering function (canny, nppiFilter, nppiFilterColumn, nppiFilterRow).

Hi, could you please provide with version of NPP you are using? Support for the Canny filter has been removed from NPP due to bugs since release 4.0. If you use an older copy of NPP you’ll also have to use a matching older CUDA toolkit, as those need to match.


OK in last npp release (included in CUDA 4.0)I have always same error in my sample with nppiFilter link… like in 3.2 and I hope one day to have again canny filtering in npp ;-)

I cannot promise that Canny will be enabled again any time soon.

As for the Filter failures, our filter tests work fine. Are you sure that your pSrc pointers are valid device pointers? Those inputs get bound to textures inside the primitives and that is likely the cause for that error.

Yes I am; this is snippet

cudaMalloc<Npp32s>((Npp32s **)&dKernel, KERNEL * KERNEL * sizeof(Npp32s));

        cudaMallocPitch<Npp8u>((Npp8u **)&dImage, &pImage, LENGTH, LENGTH);

        cudaMallocPitch<Npp8u>((Npp8u **)&dOutput, &pOutput, LENGTH, LENGTH);

cudaMemcpy2D( dKernel, KERNEL * sizeof(Npp32s), hKernel, KERNEL * sizeof(Npp32s), KERNEL, KERNEL, cudaMemcpyHostToDevice );

        cudaMemcpy2D( dImage, pImage, hImage, LENGTH * sizeof(Npp8u), LENGTH, LENGTH, cudaMemcpyHostToDevice );

NppiPoint k;

        k.x = 


        k.y = 


NppStatus p =

                nppiFilter_8u_C1R(dImage, pImage, dOutput, pOutput, sizeImage, dKernel, sizeKernel, k, 0);

“full” code link

You define

#define LENGTH 10

#define KERNEL 5

sizeImage.height = LENGTH;

sizeImage.width =  LENGTH;

You code doesn’t work because your input image is not large enough. A 5x5 kernel needs an extra 4 lines and 4 columns of input data in order to work. The nppiFilter primitive expects that data above the anchor point and to its left, assuming your image addressing increases left to right and top to bottom.

Thanks a lot for your explanation! I have updated my code and now it works like a charm :D. I very much appreciated your help.

Have a nice day


Thanks Frank. I have dropped using Canny. Btw, for nppiFilterColumn_8u_C1R, should I increase the size of the image to account for kernel size and boundary conditions? My image is definitely larger than 32x32…

Thanks again,


Yes, for those primitives, the ROI describes the size of the region being computed. That means that a kernel with a mask size larger than 1x1 needs to provide a larger source image in order to work correctly.

Thanks Frank. It worked but another PROBLEM please…

It seems that nppiFilter_8u_C1R has a bug. I tried it with a simple all white RGB image (ie. all image values set to 255).
However, values I got back were around ~(84,84,84). I expect them to be close to (255,255,255) except perhaps around boundaries where values should be

I convolved each channel with a simple Gaussian kernel with standard deviation 1.0, size 3x3.

Is it possible that since this function accepts at most one byte per channel in the input image, some arithmetic overflow is occuring?
Should I use some other filter?

Btw, I am using CUDA 3.0 with NPP version 1.1.

Thanks for all your help…

What are the values in your kernel, what do you set nDivisor to?

Hi Frank:

Gaussian kernel i used is scaled-up, discretized version. True 3x3, std=1.0, gaussian kernel is:

0.0751 0.1238 0.0751

0.1238 0.2042 0.1238

0.0751 0.1238 0.0751

I multiplied the above kernel by 100, rounded off the numbers. This gave me the final kernel.

For nDivisor, i used the sum of all the enteries in the final kernel.

The final values in the image are ~(67,67,67) instead of (255,255,255)


We run tests similar to what you’re describing and those don’t indicated a problem. If you want me to look into this in more detail, I need the complete configuration of what you’re passing to the function (ROI sizes, image content, kernel values, etc.) or a reproducer app in code.