Multiple simultaneous parallel reductions

I recently faced a problem where I need to reduce an array of values taking in consideration another array of IDs. Basically, for each different ID in the IDs’ array I need to reduce the other array.

My first idea, and the one I implemented taking in consideration the time I had, was to sequentially go through the IDs’ array and for each different ID perform a parallel reduction (from the example in the SDK), the only difference being that in the first step of the algorithm I filter the elements that conform to the current ID I’m reducing.

However, I was thinking if it could be possible to perform the multiple reductions at the same time…Since there is no way of running simultaneous kernels, the kernel itself must be prepared to perform these multiple reductions.

I haven’t reach a solution so far…But do you think something like this could be possible? Or at least easy the process of checking different IDs on the CPU?

This is called a “keyed” or “segmented” reduction. In Thrust, it’s a one-liner: reduce_by_key.

You can refer to this example which uses the function to implement a simple run-length encoding scheme.

That’s very interesting! Thanks for pointing it out. I actually can’t use thrust, as I’m using OpenCL. Just posted in the CUDA section as it’s more popular and this was a GPGPU question anyway :rolleyes:

Thanks a lot again, I’ll look into it.