this is probably a stupid question as there must be some obvious mistake in my code, but I can’t find out where.
What I want to do is this: I have a 14 bit image and want to calculate it’s histogram, then calculate the maximum of that histogram and paint it in a kernel.
d_Hist = nppsMalloc_32s(levels-1); nppiHistogramEvenGetBufferSize_16u_C1R(size_img, levels, &hbufsize); d_buffer = nppsMalloc_8u(hbufsize); nppiHistogramEven_16u_C1R(d_img, d_img_pitch, size_in, d_Hist, levels,0,16384,d_buffer); nppsMax_32s(d_Hist, levels-1, d_hMax, d_buffer); PaintHist_kernen<<<nBlocks,threadsPerBlock>>>(d_Hist, *d_hMax, d_histImg, d_histImg_pitch);
d_img is my 14 image on the device, d_Hist its histogram and d_histImg is a 384*288 image into which I want to paint the histogram, normalized to the maximum.
The histogram I get looks correct (when I copy d_hist back to the host) and re-using the buffer doesn’t seem to be the problem (same result when I create a new buffer for nppsMax_32s). But the maximum I get is completely wrong. And the strangest thing is that the maximum is correct if I calculate it for less than (levels-1) signal length. So the last value seems to cause the problem, but it’s not out of bounds or something, as I can easily check with an overexposed image.
What could be the problem?