Problem with nppiFilter_8u_C1R getting a black output


I’m working on an image processing program and I want to implement convolution via NPP.

I’m facing the problem that whatever I pass to the nppiFilter_8u_C1R function, I get a black output (even thought it is correctly sized).

Can someone help me ?

Here is the code (with little obfuscation because I’m not free to share it) :

customImageType image("My image.tif");

	float m [] ={0 , 1, 0, 1, -4, 1, 0, 1, 0};

	customImageType mask(m, 3, 3); //instanciation of the mask image


	image.renormalize(0,255); //renormalization and cast to unsigned int;

	//allocation of source image on CPU with NPP structure

	npp::ImageCPU_8u_C1 oHostSrc(image.width, image.height);

	memcpy(,, image.nbPoints*sizeof(unsigned char));

	//allocation of mask on GPU with CUDA


	Npp32s *deviceMask;

	cudaMalloc((void**)&deviceMask, mask.nbPoints()*sizeof(long));

	cudaMemcpy(deviceMask,, mask.nbPoints*sizeof(long), cudaMemcpyHostToDevice);


	NppiSize maskSize = {mask.width,mask.height};

	NppiSize ROI = {image.width - mask.width + 1, image.height - mask.height + 1};

	//allocation of the destination image on GPU

	npp::ImageNPP_8u_C1 oDeviceDst(ROI.width, ROI.height);

	//allocation of the destination image on CPU

	npp::ImageCPU_8u_C1 oHostDst(oDeviceDst.size());

	NppiPoint anchor = {0,0};	


	Npp32s* divisor = new Npp32s[1];

	divisor[0] = (Npp32s)mask.sum();

	Npp32s* deviceDivisor;

	cudaMalloc((void**) &deviceDivisor, sizeof(Npp32s));

	cudaMemcpy(deviceDivisor, divisor, sizeof(Npp32s), cudaMemcpyHostToDevice);


	//allocation of source image on GPU by copy of CPU image

	npp::ImageNPP_8u_C1 oDeviceSrc(oHostSrc);

	NppStatus ret=nppiFilter_8u_C1R(, oDeviceSrc.pitch(),, oDeviceDst.pitch(),

					ROI, deviceMask, maskSize, anchor, deviceDivisor[0]);

	oDeviceDst.copyTo(, oHostDst.pitch());

	customImageType tmp((unsigned char*), oHostDst.width(),oHostDst.height());"MyProcessedImage.tif");

I implemented erode and dilate function with no problem…

I also tried to use nppiFilterBox_8u_C1R instead of nppiFilter_8u_C1R and it worked perfectly so I suppose I’m doing something wrong with the mask but I don’t know what…

Thanks in advance.

Check for the return error code.

Hi Donos,

the first thing that caught my attention is, that you seem to be passing a dereferenced device pointer as the scale-factor parameter (last parameter [font=“Courier New”]deviceDivisor[0][/font]):

NppStatus ret=nppiFilter_8u_C1R(, oDeviceSrc.pitch(),, 

                                oDeviceDst.pitch(), ROI, deviceMask, maskSize, anchor, 


I’m surprised this doesn’t cause a seg-fault. Anyways, I think the first thing to check would be to simply pass 0 for that value. Ultimately, you probably want to fine a scale factor that somewhat matches the sum of the weights, so that the filter doesn’t change overall brightness.


@Crankie : I’ve got a NPP_NO_ERROR code !

@Frank : I just tried again with 0 instead of deviceDivisor[0] and still the same problem…

Thanks to both of you, I thought that nobody would try to suggest anything

Would it be possible to post a sample of your working erode code? I have tried to modify the boxfilter code to no avail - I get the texture bind error. Working with 64 bit CentOS linux. I have modified the ROI and offsets to be well within the image so I am not walking off the edges, etc. Very new at this, but would really like to get erode and dilate working!



I’m sorry but I rewrote everything as I decided to not to use NPP anymore. I’m now on “regular” CUDA with kernels, grids and threads, etc.

By the way all you need is in my first post as the obfucated part is mainly concerning the custom image type I was working on.

Good luck

Yes indeed! Thank you. My mistake was not allocating the mask into the device properly - as in not at all…