Fast histogram code released Source code for two new histogramming methods

As part of CUDA/GPU research, two new histogramming methods are designed:

[indent][i]We present two novel histogramming methods, both achieving
a higher performance and predictability than existing methods.

The first novel method (warp private) gives an average performance increase
of 33% over existing methods for non-synthetic benchmarks. The second novel
method (thread private) gives an average performance increase of 56% over
existing methods and guarantees to be fully data independent. While the
second method is specifically designed for the Fermi architecture, the
first method is also suitable for older architectures.[/i][/indent]

The reference paper and the code can be found at:
http://parse.ele.tue.nl/research/tools

Please keep in mind that this is a research project. To get it working for your particular input size / bin size / input format might cost some effort.

More details (installation/compilation/configuration) can be found in the README file. Technical details can be found in the paper.