I’m just heading a little problem.
I’ve got a giant array of float values, and i need to find the highes value in it. Thats maybe no problem, when using atomic functions, but how to make that fast enough?
Is it efficient to do that in cuda? (ok, i got to do it in cuda, doesn’t really matter if it makes sense)
my idea was the following:
(-every block stores it’s values in shared memory (lets say 3*512 float values))
-every block writes the current maximum in shared mem.
-every block writes the final maximum (of the block) to an indexed position of an global memory array
-the global memory array with the local maxima will be handled by the cpu (no big deal to find the max of some thousands of values)
is this plausible/smart? or is there any better way to do that on gpu?