Baffled by nppiHistogram_Even_16u_C4R

I have an application which required that I take some histograms of 4 channel data, and rather than code it up myself, I thought I might try to use NPP, and in particular the nppiHistogram_Even_16u_C4R function. But it’s the first time I’ve used these functions, and the example code for doing histogram equalization only does single channel images, which have a somewhat simpler model.

In particular, I’m having difficulty figuring out how to allocate the histogram arrays (pHist in the prototypes). They are declared as Npp32s *phist[4], and my understanding is that all the arguments passed to NPP functions must live on device memory. But I’ve tried a few things, and I can’t seem to get it defined properly.

I’ve tried:

  1. defining a buffer like:
    device Npp32s *histData[4] ;

and then doing a loop to call cudaMalloc() for each entry…

  1. defining a buffer like:

device Npp32s histData[4][256] ;

  1. using cudaMalloc to create a pointer to a space that can contain 4 pointers, and then looping, cudaMallocing each of the four entries.

None of this seems to work? Can someone point me at a working example of multi-channel histograms?

Email me, or post here if you have any help/ideas.


your first approach

the proper way to do it. Since this code is executed on the host it is not correct to use the device modifier. Just use:

Npp32s * histData[4];

for (int i = 0; i < 4; ++i)

    cudaMalloc(&histData[i], size);

I didn’t test this, but it should be a good approximation. Having to use the ampersand inside the cudaMalloc may be confusing but it basically goes like this: histData[i] is a pointer (it will be a device pointer because we allocate it using cudaMalloc). In order for cudaMalloc to modify the value of that pointer we need to give it a pointer to that pointer’s memory location, which is achieved by &.