Aggregate Variables A good way to calculate Aggregate Variables?


I have been trying to find a good method of calculating grid wide aggregate variables in cuda, but have run into some complications. Does anyone know of a good method to calculate such variables?

I was attempting to have each block of threads write their data to a global variable, and then combine those global variables. I have not been able to find a way to ensure that all threads are able to write their data to the variable. I expected things to slow down while threads waited to add their value, but instead many of the threads are just skipped.



You might want to take a look at thrust’s reduction function (…4b3b6be1f5c202d ), or check out the sdk reduction example if you want an example of how to implement it yourself.