Prefix sum with a wrap around

Hi.

I am planning to produce a prefix sum with a wrap around.
For example, it works like this

threshold: 20
input: 2 4 1 5 4 2 3 5 1 2 1 4 3 4 2 4
output: 2 6 7 12 16 18 3 8 9 11 12 16 19 4 6 10

I have no idea how to compute this process in CUDA.
Can anyone give me some ideas?

Thanks.

Maybe I am a bit dense today, but after staring at the example for several minutes I still haven’t figured out what the underlying algorithmic specification is.

My explanation was insufficient.

The normal inclusive sum is such an output.

in: 2 4 1 5 4 2 3 5 1 2 1 4 3 4 2 4
out: 2 6 7 12 16 18 21 26 27 29 30 34 37 41 43 47

In a single thread, we usually do out[i] = out[i-1]+in[i] in order starting from in[0].
I would like to set a threshold value for this, so that if the inclusive sum exceeds it, in[i] is output as is, without accumulation.

out[i] = (out[i-1]+in[i] > thres) ? in[i] : out[i-1]+in[i]

threshold: 20
in: 2 4 1 5 4 2 3 5 1 2 1 4 3 4 2 4
out: 2 6 7 12 16 18 3 8 9 11 12 16 19 4 6 10

I can’t get this same behavior by changing the operator of InclusiveScan in CUB.

Can you show the scan operator that you have tried with cub?

like this

template<typename T>
struct RoundSum {
    
    T thres;

    __host__ __device__ __forceinline__ T
    operator()(const T& a, const T& b) const {
        T tmp = a + b;
        return (tmp > thres) ? b : tmp;
    }
};

Maybe you can modify the normal prefix sum result.

Difference between normal prefix sum and your prefix sum:
0 0 0 0 0 0 18 18 18 18 18 18 18 37 37 37

So it seems for each segment it should be the value before the wrap-around that should be subtracted from the ordinary prefix sum.

I too wish I could compute the prefix sum of the previous segment, but I fall into the recursion of needing the segment boundary to know the segment boundary.

Can you find the segment boundaries by dividing every element of the prefix sum by 20, checking where the results increase?
0 0 0 0 0 0 1 1 1 1 1 1 1 2 2 2

In the example I have shown, it is possible to solve the problem using your method. But when in is like this it cannot be used.

threshold: 20
in: 7 6 2 2 5 5 8 4 0 1 1 1 8 10 1 1
required out: 7 13 15 17 5 10 18 4 4 5 6 7 15 10 11 12
normal prefix sum: 7 13 15 17 22 27 35 39 39 40 41 42 50 60 61 62
prefix sum by 20: 0 0 0 0 1 1 1 1 1 2 2 2 2 3 3 3