A critical problem with nppiFilter

sadatmandt · September 9, 2010, 8:36am

Hi everybody

I have a problem with nppiFilter_8u_C1R. It returns noise with 2-dimensional kernels. Besides, when I use it with a 1-dimensional kernel, the source image should be
square otherwise, the destination image is shifted inappropriately. I doubt on my graphic card (GeForce 8400 GS) and memory allocation method to raise this problem.
I use nppMalloc and cudaMemCopy to allocate memory on the device and copy data from host.

Thank you in advance for your kind attention.

Frank_Jargstorff · October 6, 2010, 5:44pm

I think I need a little more information in order to help. Could you post code or the input and result images?

Frank_Jargstorff · October 6, 2010, 5:44pm

I think I need a little more information in order to help. Could you post code or the input and result images?

bakerb · January 13, 2011, 5:06pm

I’m not the original poster, but am having the same problem. Here’s some of my code.

printf("Step sizes: %d, %d\n", paddedImg.pitch(), oDeviceDst.pitch());

        printf("ROI: %d, %d\n", imgSz.width, imgSz.height);

        printf("Kernel Size: %d, %d\n", kernelSize.width, kernelSize.height);

        printf("Kernel Anchor: %d, %d\n", kernelAnchor.x, kernelAnchor.y);

        printf("Divisor: %d\n", intKernel->divisor);

// Copy kernel to GPU

        Npp32s *d_kernel;

        int step;

        d_kernel = nppiMalloc_32s_C1(kernelSize.width, kernelSize.height, &step);

	printf("Kernel step: %d\n", step);

        cudaMemcpy2D(d_kernel, step, intKernel->val, kernelSize.width * sizeof(Npp32s), kernelSize.width * sizeof(Npp32s), kernelSize.height, cudaMemcpyHostToDevice);

	printf("cudaMemcpy2d error status: %s\n", cudaGetErrorString( cudaGetLastError() ) );

	Npp32s *cpuKernel = (Npp32s*)malloc(kernelSize.width*kernelSize.width*sizeof(Npp32s));

	cudaMemcpy2D(cpuKernel, kernelSize.width * sizeof(Npp32s), d_kernel, step, kernelSize.width * sizeof(Npp32s), kernelSize.height, cudaMemcpyDeviceToHost);

	printf("cudaMemcpy2d error status: %s\n", cudaGetErrorString( cudaGetLastError() ) );

	for (int y = 0; y < kernelSize.height; y++) {

           for (int x = 0; x < kernelSize.width; x++) {

              printf("%3d ", cpuKernel[y*kernelSize.width+x]);

           }

           printf("\n");

        }

eStatusNPP = nppiFilter_8u_C1R(paddedImg.data(widthOffset, heightOffset), paddedImg.pitch(), 

                                       oDeviceDst.data(), oDeviceDst.pitch(), 

                                       imgSz, d_kernel, kernelSize, kernelAnchor, intKernel->divisor);

        printf("nppiFilter error status: %d\n", eStatusNPP);

This is the output from the printf statements in the code running on the 512x512 Lena image that comes with NPP. I’ll attach the output image I get as well.

Step sizes: 768, 768

ROI: 512, 512

Kernel Size: 3, 3

Kernel Anchor: 1, 1

Divisor: 9

Kernel step: 256

cudaMemcpy2d error status: no error

1 1 1

nppiFilter error status: 0

Saved image: …/…/data/Lena_unsharpMask.pgm

Frank_Jargstorff · January 13, 2011, 8:34pm

Looking at the code, I see one issue: You’re using a 2D memory allocation and copy for your 3x3 kernel. When you look at nppiFilter_8u_C1R(), you’ll see that there is no parameter for “kernel step”, i.e. the primitive has no way of knowing how far apart successive lines in the kernel are. This is because the function expects the kernel weights to be stored as a densly packed array. E.g. if you kernel is made up of

w00, w01, w02

w10, w11, w12

w20, w21, w22

the array you’re passing to nppiFilter_8u_C1R() should look like this:

+-----+-----+-----+-----+-----+-----+-----+-----+-----+

|DWORD|DWORD|DWORD|DWORD|DWORD|DWORD|DWORD|DWORD|DWORD|

+-----+-----+-----+-----+-----+-----+-----+-----+-----+

| w00 | w01 | w02 | w10 | w11 | w12 | w20 | w21 | w22 |

+-----+-----+-----+-----+-----+-----+-----+-----+-----+

and you would pass the pointer to the first element to your function. I would recommend you replace the kernel device memory allocation which is currently a 2D alloc with a simple cudaMalloc for the 94 bytes. Since you currently have your host kernel data stored in an image, you’ll have to come up with some way of reformatting those into a contiguous chunk of data. You can use cudaMemcpy2D to do this, by passing the size of densly packed rows (i.e. 34 = 12 bytes) as the destination step.

For a quick hack, you could simply replace the line:

cudaMemcpy2D(d_kernel, step, intKernel->val, kernelSize.width * sizeof(Npp32s), ...);

with

cudaMemcpy2D(d_kernel, 12, intKernel->val, kernelSize.width * sizeof(Npp32s), ...);

Can you give that a shot and let me know what happens?

bakerb · January 19, 2011, 6:08pm

I tried both the temporary solution you gave and using a cudaMalloc call and both work great. Thank you.

vincetallica · February 21, 2013, 5:40pm

Hi

I’ve been trying to use the nice NPPI filter function, for my image is read as binary into an array. I’ve been getting status=0, which is great,but the image looks cut off. My image is 986x400.
My code looks like this:

extern “C” void twoDseg (int frameWid,
int frameHei,
int frames,
int fileLen,
unsigned char *oneframe_buffer)
{ int sp = 0;

Npp8u pSI = nppiMalloc_8u_C1(width, height, &sp);
cudaMemcpy2D(pSI, sp, oneframe_buffer, widthsizeof(unsigned char), width*sizeof(unsigned char), height, cudaMemcpyHostToDevice);

int dp =0;
Npp8u * pDI = nppiMalloc_8u_C1(width, height, &dp);

NppiSize mask = {5, 5};
NppiPoint anchor = {0, 0};
NppiSize ROI = {width - mask.width/2, height - mask.height/2};

NppStatus eStatusNPP;
eStatusNPP=nppiFilterBox_8u_C1R(pSI, sp, pDI, dp, ROI, mask, anchor);
NPP_ASSERT(NPP_NO_ERROR == eStatusNPP);
printf (“eStatusNPP: %i \n”, eStatusNPP);

//Copy the results back to CPU
cudaMemcpy2D(oneframe_buffer,widthsizeof(unsigned char),pDI,dp,widthsizeof(unsigned char),height,cudaMemcpyDeviceToHost);

Can you please help me how to use this gaussian filter for my image?

Thank you,
Vince

Topic		Replies	Views
NPP - nppiFilter_8u_C1R returns KERNEL_EXECUTION Debug options? CUDA Programming and Performance	6	6736	April 25, 2010
Problem with nppiFilter_8u_C1R getting a black output CUDA Programming and Performance	6	7679	August 25, 2011
nppiFilter_8u_C1R : Error -24 NPP_TEXTURE_BIND_ERROR CUDA Programming and Performance	1	3641	October 28, 2011
nppiFilterGaussAdvanced_8u_C1R produces discolored band on the right GPU-Accelerated Libraries npp	11	116	December 3, 2024
Calling NPP helper with large image gives kernel execution error GPU-Accelerated Libraries npp	3	1796	November 11, 2021
What's wrong with nppiFilter_8u_C1R ? CUDA Programming and Performance	1	12295	March 7, 2011
NPP_TEXTURE_BIND_ERROR error with Canny edge detector.. CUDA Programming and Performance	14	11038	August 23, 2011
Problem with nppi morphological operation GPU-Accelerated Libraries	3	1091	August 7, 2018
NPP TEXTURE BIND ERROR NppiFilter_8u_C1R error CUDA Programming and Performance	2	4340	August 18, 2011
Using multiple streams in npp GPU-Accelerated Libraries npp	0	1011	January 25, 2022

A critical problem with nppiFilter

Related topics