I’m trying to use NPP for my project work and I use nppiHistogramRangeGetBufferSize_32f_C1R to get the buffer size. But for the following case, it’s giving negative buffer size.
NppiSize oBuffROI;
oBuffROI.width : 4000001
oBuffROI.height : 1
bins: 4096
int buffsize;
nppiHistogramRangeGetBufferSize_32f_C1R(oBuffROI, bins, &buffsize);
buffsize returned : -198692736
Is this a bug? I'm using CUDA 4.0.
Hi, thanks for bringing this up to our attention. This is indeed a bug caused by a classic integer overflow. I added a fix adressing this problem. It’ll be shipped in the next major release.
In any event, this is not going to help your problem. The the fix causes an NPP_SIZE_ERROR error code to be returned by the GetBufferSize routine. There is nothing to address the issue, that for the image size you mention, you’d require more than 2G of scratch memory and that the integer return value couldn’t capture that.
The scratch memory required by our Histogram implementation is roughly 4 * bins * totalBlocks. The total number of blocks we’re launching depends on the data-format, channels, etc. But generally speaking a block spans 16 times as many lines as it does columns, so going high rather than wide would be an approach to maximize the total number of pixels that the primitive can process. None of this will work well for the type of degenerate ROI from your example above. The NPP Image Processing primitives are designed with images in mind and not 1 dimensional arrays.