Reduction question

Hi,

I was reading through the reduction topics in this forum and got a question.

I want to “sort” or “map” a huge amount of data (like a million float2 values) into a huge array of floats (like 8080, or 200200 values). I can’t just “+=” add the values on the float array, because of the parallel memory acces problems.
But if I use a reduction, i think i need one of those float-arrays where i sort the float2s in per thread and after that i can reduce them, that would mean an enormous amount of data…
And anyways i think reduction only works within the threads of a block and i would need way more than just one threadblock.

So the question is, how can I solve this problem?
Thx for any help.

btw…

I could start a thread for every point on the (200*200) float grid and check for every float2 value, whether the particle has an influence on it or not, but that can’t be the best solution, right?

It’s really unclear what you are trying to do. Does each output depend on every input? What would a sequential algorithm look like?

What are you trying to write? Some sort of particle dynamics code or n-body simulation?