Undefined Behaviour of __syncthreads and branching inside a __device__ function

Another thought: Wouldn’t it be better (safer) to have a random generator state per thread instead of per thread block?