Hi everybody,

I want to do a sum over some elements of a vector (power of two an sorted previously), and write this sum in other vector. The number of elements to sum in each case it’s not known but there are values that match a condition (are pairs index-value). Let me put a simple example:

in vector have pairs (index, values) and contain this information for several points (each of size 4 in this example). So I need to compute the sum of each subvector for every point if the index _i math. Subvector are sorted by index using bitonic sort very fast.

in vector: [ (1, 0.1) (1, 0.5) (1, 0.2) (3, 0.2) - (1, 0.2) (1, 0.3) (3, 0.2) (3, 0.2) ]

out vector: [ (1, 0.8) (3, 0.3) (INT_MAX,0) (INT_MAX, 0) - (1, 0.5) (3, 0.4) (INT_MAX,0) (INT_MAX, 0) ] // same length

Now, I’m launching as many threads as points I have (in the example 2 threads) that look for the index and write the result, but I want to make this more efficiently. Do you have some ideas to put in practice?

Thanks in advance