I’m having trouble understanding the memory limitations in the 64 bin sample provided by NVIDIA, oclHistogoram.
The Pdf doc states
"Such strategy however introduces some serious limitations: 16 KB per average 192 work-items in a group amount to the maximum of ~85 bytes of local memory per work- item. So this approach limits the histogram resolution to 64 bins on G8x / G9x / G10x NVIDIA GPUs. From the implementation perspective, byte counters also introduce 255- byte limit to the data size processed by single work-item, which must be taken into account during data subdivision between the execution threads. "
So if you have a work group size of 64, with 192 work items and each uses 64 bytes (one byte per counter in the 64 bin) that gives 192 x 64 = 12288 bytes.
Why not just reduce the work group size to 32 work items. Then you would have 32 x 64 = 2048 bytes. You could even increase the bin size to 256 and still be under the 16Kb limit.
Obviously I’m missing something. Any ideas?
Any advice much appreciated.