Buggy NPP ?

Hi again!

I had a closer look on NPP the last days and found some weird things I don’t understand. Please see my other post from today regarding the sum function. My new problem is the following:

Using the nppiMinMax_8u_C4R function, I always get only the result from the first channel, not for all four. So the array is filled with 4 times the value of the first channel, i.e. {min(channel1), min(channel1), min(channel1), min(channel1)} and not as expected {min(channel1), min(channel2), min(channel3), min(channel4)}, max-array respectively.

I can’t believe the NPP library is such a buggy thing, so I think I’m doing something wrong. But what?

Thanks for any help.

int pitch = 0;

	NppiSize size; //image data size

	size.height = 256;

	size.width = 256;


	//alloc image on device

	Npp8u* d_image = nppiMalloc_8u_C4(size.width, size.height, &pitch);


	//alloc image on host

	unsigned char* h_img = (unsigned char*)malloc(size.width * size.height * 4);


	//fill host image with random data

	for (int i = 0; i < size.width; i++)

	for (int j = 0; j < size.height; j++)

	for (int c = 0; c < 4; c++)


		h_img[i*4 + j * size.width*4 + c] = rand() % 255;


	//copy host image to device image

	cudaMemcpy2D(d_image, pitch, h_img, size.width * 4, size.width * 4, size.height, cudaMemcpyHostToDevice);

	NppStatus status; 

	//Set all pixel values to {2,3,4,A}

	Npp8u data[3] = {2,3,4};

	status = nppiSet_8u_AC4R(data, d_image, pitch, size);

	//buffer size

	int bufferSize = 0;


	status = nppiMinMaxGetBufferSize_8u_C4R(size, &bufferSize);

	//buffer on device

	Npp8u* buffer;

	//Min/Max arrays on device

	Npp8u* d_min, *d_max;

	//alloc them

	cudaMalloc(&d_min, 4);

	cudaMalloc(&d_max, 4);

	cudaMalloc(&buffer, bufferSize);


	//host min/max

	Npp8u h_min[4];

	Npp8u h_max[4];

	//compute min/max

	status = nppiMinMax_8u_C4R(d_image, pitch, size, d_min, d_max, buffer);


	//copy min/max to host

	cudaMemcpy(h_min, d_min, 4, cudaMemcpyDeviceToHost);

	cudaMemcpy(h_max, d_max, 4, cudaMemcpyDeviceToHost);

	//h_min is (2,2,2,2) but should be (2,3,4,rand())

	//h_max is (2,2,2,2) but should be (2,3,4,rand())

	//Copy image back to host

	unsigned char* h_img2 = (unsigned char*)malloc(size.height * size.width * 4);

	cudaMemcpy2D(h_img2, size.width * 4, d_image, pitch, size.width * 4, size.height, cudaMemcpyDeviceToHost);

	//h_img2 is now {2,3,4,rand(),2,3,4,rand(),2,3,4,rand()...} as it should be

This bug will be fixed in our next NPP release. Thank you for pointing this out.