I was trying to divide a process which calculates some variables like “mean” and “standard deviation”. All these requires access from different blocks and threads to the same variable.
For threads the solution seem to be an easy one, using a shared variable and letting one thread after a __syncthread() function do the job.
But the communication from different blocks creates a problem. Using a global variable may solve the problem in a long way (using an array of the same size as the number of the blocks and with using blockidx storing all the variables in different memory spaces. Then calling a kernbel function to sum them up). I cannot access the global variable in kernel function since two different blocks may access the same variable at the same time and try to change the same value (which must have been first increased for example) and get the calculation wrong.
Is there any faster way doing this, without calling two different kernel functions ?