Help using NPP Morphological Functions

Hi,

I’m having problem using the NPP morphological functions, namely nppiErode_8u_C1R, and nppiDilate_8u_C1R. Documentation is lacking and the samples provided with NPP do not cover the morphological operations. I’m trying to modify the box filter example to call nppiErode_8u_C1R instead however the kernel is returning a -3 error (Kernel Launch Failure). I’m trying to call erode with a 3x3 kernel and an anchor centered in the middle of the kernel as follows:

try

	{

	   //////

	   // Start of sample code

	   //////

		// if more than one command line arg, use the first arg as the filename,

		// otherwise assume the filename included with the sample

		std::string sFilename = "../../data/Lena.pgm";

		if (argc >= 2)

			sFilename = argv[1];

		std::string sResultFilename = sFilename;

		

		std::string::size_type dot = sResultFilename.rfind('.');

		if (dot != std::string::npos) sResultFilename = sResultFilename.substr(0, dot);

		sResultFilename += "_boxFilter.pgm";

		if (argc >= 3)

			sResultFilename = argv[2];

		

				// declare a host image object for an 8-bit grayscale image

		npp::ImageCPU_8u_C1 oHostSrc;

				// load gray-scale image from disk

		npp::loadImage(sFilename, oHostSrc);

				// declara a device image and copy construct from the host image,

				// i.e. upload host to device

		npp::ImageNPP_8u_C1 oDeviceSrc(oHostSrc);

				

		/////

		// End Sample Code

		//////

		

		// 3x3 mask

		NppiSize oMaskSize = {3, 3};

		// mask... is this the correct way to initialize the mask??

		Npp8u mask[9] = {1,1,1,1,1,1,1,1,1};

		// ROI... the size of the input image

		NppiSize oSizeROI = {oDeviceSrc.width(), oDeviceSrc.height()};

		// allocate output image same as input image

		npp::ImageNPP_8u_C1 oDeviceDst(oSizeROI.width, oSizeROI.height);

		// set anchor point inside the mask to (1, 1)

		NppiPoint oAnchor = {1, 1};

		// run erode

		NppStatus eStatusNPP;

		eStatusNPP = nppiErode_8u_C1R(oDeviceSrc.data(), oDeviceSrc.pitch(), oDeviceDst.data(), oDeviceDst.pitch(), oSizeROI, mask, oMaskSize, oAnchor);

		////

		// Start of Sample Code

		////

		NPP_ASSERT(NPP_NO_ERROR == eStatusNPP);

				// declare a host image for the result

		npp::ImageCPU_8u_C1 oHostDst(oDeviceDst.size());

				// and copy the device result data into it

		oDeviceDst.copyTo(oHostDst.data(), oHostDst.pitch());

		

		saveImage(sResultFilename, oHostDst);

		std::cout << "Saved image: " << sResultFilename << std::endl;

	}

	catch (npp::Exception & rException)

	{

		std::cerr << "Program error! The following exception occurred: \n";

		std::cerr << rException << std::endl;

		std::cerr << "Aborting." << std::endl;

		

		getchar();

		return -1;

	}

	catch (...)

	{

		std::cerr << "Program error! An unknow type of exception occurred. \n";

		std::cerr << "Aborting." << std::endl;

		

		getchar();

		return -1;

	}

	////

	// End of Sample Code

	////

Can anyone please point out what I’m doing wrong?

Thanks,

Steven

any help (especially from NVIDIA would be greatly appreciated)

any help (especially from NVIDIA would be greatly appreciated)

Looking at the code you posted, I would say the problem is that you’re not reducing the ROI according to the size of your mask. For a 3x3 mask you would have to reduce the size of your ROI by 2 pixels on each side.

By making the mask centered (anchor 1,1) you would also have to move the source pointer to the second pixel in the second line of the source image. This you would do via pointer arithmetic.

Looking at the code you posted, I would say the problem is that you’re not reducing the ROI according to the size of your mask. For a 3x3 mask you would have to reduce the size of your ROI by 2 pixels on each side.

By making the mask centered (anchor 1,1) you would also have to move the source pointer to the second pixel in the second line of the source image. This you would do via pointer arithmetic.

Replying to this old thread just in case someone stumbles across it looking for answers about NPP morphological transformations as I did today.

Besides the problems pointed out by Frank, the mask array in the posted code is a host-based array. This cannot be used by the CUDA kernels. A GPU-based array of the same size needs to be allocated, the mask data transferred to it and this pointer can be passed to nppiErode_8u_C1R().

E.g. like this:

Npp8u *dvc_mask;
cudaMalloc(&dvc_mask, sizeof(mask));
cudaMemcpy(dvc_mask, mask, sizeof(mask), cudaMemcpyHostToDevice);
...
nppiErode_8u_C1R(oDeviceSrc.data(), oDeviceSrc.pitch(), oDeviceDst.data(), oDeviceDst.pitch(), oSizeROI,
                 dvc_mask, oMaskSize, oAnchor)