Hi,
I was reading through the reduction topics in this forum and got a question.
I want to “sort” or “map” a huge amount of data (like a million float2 values) into a huge array of floats (like 8080, or 200200 values). I can’t just “+=” add the values on the float array, because of the parallel memory acces problems.
But if I use a reduction, i think i need one of those float-arrays where i sort the float2s in per thread and after that i can reduce them, that would mean an enormous amount of data…
And anyways i think reduction only works within the threads of a block and i would need way more than just one threadblock.
So the question is, how can I solve this problem?
Thx for any help.