Reduction on small arrays

I need to find the max (actually top three), the sum, and the sum of the squares of my results. I will use a reduction! I’m new to reductions, so I read about the seven reduction algorithms. Very informative!

I will have something between 2^10 and 2^12 elements in my array. If I use the most highly optimized reduce6, I will only be using 1 block. And even with the others, I will probably be using 4 at most… this does not sound optimized!

Before I start out, I just want to get a feel for what is likely to be the best solution for such a small number of results. Should I do a reduce6 type in one block (knowing it’s a waste), considering that it will be quick and that this is not the most time consuming part of the gpu work in my code. It will also be nice not to have to launch the reduction kernel more than once. Or are there other optimizations to consider in the small array situation?


i think i saw an optimized reduction like the one you are asking already implemented as part of the cudpp library.