I run a statistical analysis of a large number of short texts and need to feed the findings into a kind of histogram with some hundred million bins. Is there any way to do that on the GPU? Grateful for any hint, pseudo-code or anything. Even a clear “no, impossible”.

A global histogram of 100 million bins should be doable. If the histogram bin counters are 32-bit int, that is only 400MB for the bin counters, leaving ~10GB or more for the data storage buffers on a modern high end GPU.

There are CUDA sample codes that demonstrate basic histogramming, one such is here:

http://docs.nvidia.com/cuda/cuda-samples/index.html#cuda-histogram

Also thrust can offer some help with building histograms:

https://github.com/thrust/thrust/blob/master/examples/histogram.cu

Thanks a lot. I’ll check it out…

I did an implementation for large histograms & sorting a couple of years ago:

http://www.hpcsweden.se/files/P0202_JimmyPettersson_accepted.pdf

Notice the step-wise performance degradation with the number of bins… As is shown in the graph it is possible to be quite a bit faster than thrust (at least back then).