Histogram calculation with an arbitrary number of bins is a problem on GPUs. nVidia released a histogram example which supports 256 bins for 8-bit data with CUDA 1.1 release. However the program is still very limited.
You can find the source code for histogram calculation with any number of bins that operates on 32-bit floating point data of any size (the input however needs to be between 0-1 range, but you can easily change the code to support any other range if you prefer not to normalize your data first) on my website:
[url=“http://users.rsise.anu.edu.au/~ramtin/cuda.htm”]http://users.rsise.anu.edu.au/~ramtin/cuda.htm[/url]
The code is based on the following two publications:
@inproceedings{Shams_ICSPCS_2007,
author = “R. Shams and R. A. Kennedy”,
title = “Efficient Histogram Algorithms for {NVIDIA} {CUDA} Compatible Devices”,
booktitle = “Proc. Int. Conf. on Signal Processing and Communications Systems ({ICSPCS})”,
address = “Gold Coast, Australia”,
month = dec,
year = “2007”,
pages = “418-422”,
};
@inproceedings{Shams_DICTA_2007a,
author = “R. Shams and N. Barnes”,
title = “Speeding up Mutual Information Computation Using {NVIDIA} {CUDA} Hardware”,
booktitle = “Proc. Digital Image Computing: Techniques and Applications ({DICTA})”,
address = “Adelaide, Australia”,
month = dec,
year = “2007”,
pages = “555-560”,
doi = “10.1109/DICTA.2007.4426846”,
};
I look forward to your feedback and comments.