Block-wide reduction with variable size

younik · March 15, 2022, 9:49am

Hello,

I am doing reductions with these requirements:

I need to do it inside a kernel as a block-wide reduction
I don’t want to make any assumption on the size of the array
I care about performances

I worked with CUB, but with it, I need to assume at least the maximum size.
I would prefer to not implement it by myself because I want to achieve the best performance.

There are any alternatives?

striker159 · March 15, 2022, 9:56am

What is the problem of your current approach?
You can simply iterate over chunks of your input data and reduce the chunks.

younik · March 15, 2022, 10:05am

Yes, this is the approach I am doing right now (previously I just tested CUB with fixed-size input).

I wondered if there exists something more efficient than a for loop with fixed thread blocks since I guess it is a common problem

Robert_Crovella · March 15, 2022, 1:52pm

You can write a block-wide reduction using a block-stride loop. It will be very efficient, and makes no assumptions about block size (other than power-of-2) or data set size.

system · March 29, 2022, 1:52pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.