As part of CUDA/GPU research, two new histogramming methods are designed:
[indent][i]We present two novel histogramming methods, both achieving
a higher performance and predictability than existing methods.
The first novel method (warp private) gives an average performance increase
of 33% over existing methods for non-synthetic benchmarks. The second novel
method (thread private) gives an average performance increase of 56% over
existing methods and guarantees to be fully data independent. While the
second method is specifically designed for the Fermi architecture, the
first method is also suitable for older architectures.[/i][/indent]
The reference paper and the code can be found at:
Please keep in mind that this is a research project. To get it working for your particular input size / bin size / input format might cost some effort.