Problem with nppi morphological operation

Hi there,

I’m doing morphological operation on a image with some functions, but I get a weired result and don’t know why. Here’s my code:

#include <stdio.h>
#include <iostream>
#include <string.h>
#include <fstream>

#include <nppi.h>
#include <npp.h>
#include <helper_string.h>
#include <helper_cuda.h>

#include <ImageIO.h>
#include <ImagesNPP.h>
#include <ImagesCPU.h>

#include <cuda_runtime.h>
int main()
{
		std::string file_src(".\\test3_initial.bmp");
		npp::ImageCPU_8u_C1 Host_Src;
		//read image from disk to Cpu
		npp::loadImage(file_src, Host_Src);
		npp::ImageCPU_8u_C1 Host_Dst(Host_Src.size());

		//Copy image from Cpu to Gpu
		npp::ImageNPP_8u_C1 Device_Src_8u(Host_Src);
		//-----------------------------Var_Threshold-----------------------------------//
		npp::ImageNPP_8u_C1 Close_mask(25, 25);                       //close morph-operation mask
		npp::ImageNPP_8u_C1 result1(Device_Src_8u.size());            //the result after 3x3 erode
		npp::ImageNPP_8u_C1 result2(Device_Src_8u.size());            //the result after 3x3 dilate
		npp::ImageNPP_8u_C1 result3(Device_Src_8u.size());            //the result after MorphCloseBorder
		//npp::ImageCPU_8u_C1 mask_tes(Close_mask.size());

		NppiSize Src_size = { (int)Device_Src_8u.width(), (int)Device_Src_8u.height() };
		NppiSize roi_size = { (int)Device_Src_8u.width(), (int)Device_Src_8u.height() };
		NppiPoint offset = { 0, 0 };
		NppiPoint anchor = { 0, 0 };
		NppiSize close_mask_size = { (int)Close_mask.width(), (int)Close_mask.height() };
		NppiPoint close_mask_anchor = { (close_mask_size.width - 1) / 2, (close_mask_size.height - 1) / 2 };

		int hpbuffersize;
		Npp8u* pBuffer;
		NppStatus err;
		//-----------------------------Var_Threshold-----------------------------------//

		//------------------------------Declare Over-------------------------//

		//-----------------------------Var_Threshold-----------------------------------//
		nppiSet_8u_C1R(1, Close_mask.data(), Close_mask.pitch(), close_mask_size);

		nppiErode3x3Border_8u_C1R(
			Device_Src_8u.data(),
			Device_Src_8u.pitch(),
			Src_size, offset,
			result1.data(),
			result1.pitch(),
			roi_size, NPP_BORDER_REPLICATE
		);

		nppiDilate3x3Border_8u_C1R(
			result1.data(),
			result1.pitch(),
			Src_size, offset,
			result2.data(),
			result2.pitch(),
			roi_size, NPP_BORDER_REPLICATE
		);

		nppiMorphGetBufferSize_8u_C1R(roi_size, &hpbuffersize);
		cudaMalloc((void**)(&pBuffer), hpbuffersize);
		err = nppiMorphCloseBorder_8u_C1R(
			result2.data(), result2.pitch(),
			roi_size, offset,
			result3.data(), result3.pitch(),
			roi_size, Close_mask.data(), close_mask_size, close_mask_anchor,
			pBuffer, NPP_BORDER_REPLICATE
		);

		result3.copyTo(Host_Dst.data(), Host_Dst.pitch());
		//Save image
		npp::saveImage(".\\test3_test.pgm", Host_Dst);
}

I first read the image(variable “Device_Src_8u”), do 3x3 erode and dilation(which means the open operation) with nppiErode3x3Border_8u_C1R() and nppiDilate3x3Border_8u_C1R(). Then I do close operation with a user-defined mask(25x25 rectangle). The result after close operation(variable “result3”) is extremely wired as there seems almost no difference after the operation in the vertical direction.

Also, I upload the Initial image, the result after open operation(result 2) and the result after the close operation(result 3):

Initial image: https://drive.google.com/open?id=1XEbel4P52nDUSztD18cNp-lUy9iNrJEU
Result2: https://drive.google.com/open?id=1CmC8KUScQ5vh5GMSZW4tEJx9ic-ddFsm
Result3: https://drive.google.com/open?id=1dB6OOxVO5YdiuKgbNfs_mwGSe5BHjNK3

Can anyone help me?

Thanks!

Similar to another question you asked:

https://devtalk.nvidia.com/default/topic/1037914/gpu-accelerated-libraries/problem-when-using-npp-libirary-nppiminindx_32f_c1r-/

I would want to have a self contained test case. Notice how I changed your code when I posted my response. I included a synthetic image so that I did not need to load an image from disk, and I understood the math well enough to know what the operation should be doing, and test for that.

If you want to modify your provided test case here so that it does not require loading an image from disk, but instead creates a synthetic image, and also that you indicate specifically what pixel values you are getting, and specifically what pixel values you expect (similar to what I did in my response to your question linked above), I will take a look as time permits. If you don’t wish to do that, its fine, perhaps someone else will be able to help you. I’m not able to work with statements like the results are “extremely weird”.

Note that if you follow this suggestion, it should not be necessary for you to submit a test case that has several NPP operations in it. If you think an NPP operation is not performing correctly, my suggestion is to generate a synthetic image, define exactly what outcome you expect from a single NPP operation, then provide a sample code that does that operation only, and display the pixel values you are actually getting, vs. what you expect.

In my opinion, that is the way you can make it easiest for others to help you, and your questions are more likely to get useful answers that way.

Just suggestions, do what you wish. But I generally don’t work on provided test cases where the test is not crafted in such a way as to make it easy for me to attack the problem. Remove anything (even including image loading) that is unnecessary. Yes, it requires effort on your part. If you don’t wish to do it, don’t, that is OK. I’m less likely to look at your problem that way, however. If you don’t think its important enough to put the effort into it, then I may think its not important enough to put the effort into it. I assume we can agree that what is fair for you is fair for me.

Thanks for the suggestion and sorry for the waste of your time. I’ll do as you said and check the reuslt.

Again, thank you for helping me. I’ll do the test later and provide the new code if there’s still wrong.

Hi, I do a double check but there is still some problem. The code is tested as follow:

npp::ImageCPU_8u_C1 Host_Src(500, 500);
		Npp8u *base = Host_Src.data();
		for (int i = 0; i < Host_Src.height(); i++) {
			for (int j = 0; j < Host_Src.width(); j++) {
				if (i % 10 < 5 && j % 20 < 5)
					base[j] = 255;
			}
			base += Host_Src.pitch();
		}
		npp::ImageNPP_8u_C1 Device_Src_8u(Host_Src);
		npp::ImageCPU_8u_C1 Host_Dst(Device_Src_8u.size());
		npp::ImageNPP_8u_C1 Close_mask(10, 31);                       //close morph-operation mask
		npp::ImageNPP_8u_C1 result3(Device_Src_8u.size());            //the result after MorphCloseBorder

		NppiSize roi_size = { (int)Device_Src_8u.width(), (int)Device_Src_8u.height() };
		NppiPoint offset = { 0, 0 };
		NppiSize close_mask_size = { (int)Close_mask.width(), (int)Close_mask.height() };
		NppiPoint close_mask_anchor = { (close_mask_size.width - 1) / 2, (close_mask_size.height - 1) / 2 };
	
		int hpbuffersize;
		Npp8u* pBuffer;
		nppiSet_8u_C1R(1, Close_mask.data(), Close_mask.pitch(), close_mask_size);

		nppiMorphGetBufferSize_8u_C1R(roi_size, &hpbuffersize);
		cudaMalloc((void**)(&pBuffer), hpbuffersize);
		nppiMorphCloseBorder_8u_C1R(
			Device_Src_8u.data(), Device_Src_8u.pitch(),
			roi_size, offset,
			result3.data(), result3.pitch(),
			roi_size, Close_mask.data(), close_mask_size, close_mask_anchor,
			pBuffer, NPP_BORDER_REPLICATE
		);
                nppiFree(base);

I make some regular pixel area in the image. The mask of morphological operation is 10 pixel wide and 30 pixels high. Each area in the original image is vertically 5 pixels in vertical direction and 15 pixels in horizontal direction. So the ideal result should be couple of column lines while the actual result is strange. Also, I run the cuda memory check but there isn’t any error.

Hope for some advice. Thanks!