Hi all,
I am trying to think of effective way of aggregating series of int values on cuda.
On input: 144036510=5256000 int words.
On output: 365*10=3650 float words, every word - average of 1440 int values
From one side, its not just reduction as such I can’t imagine straight forward way to apply Multiple Adds/Thread from SDK Reduction example.
From the other side, many of 1440-aligned averaged groups will not align to 256 address thus reduction within the every group doesn’t sound effective as well.
I was thinking about Atomic operations but have not tried yet.
Reading through texture fetch units also doesn’t sound effective as data are not reused and no caching is required, no logic on wrapping/normalizing is required as well.
Could you please advise something?