I am creating a global variable in my code say counter through cudaMalloc() now I want to assign it with some value say 0 before kernel starts. or may be inside kernel before any threads starts any execution. In short I want all threads to see the same value at the begining before any of them updates it.
I tried __syncthreads() but the problem is it seems that it does not synchronizes threads across the blocks.