I have this question:
If many threads in a warp want to read the same adress in global memory, this data is broadcasted, is that right?
If many threads in a warp want to write into an adress in global memory, there is a serialization, but is not possible to predict the order, is that right? All threads gonna write?
But, the main question: If many threads in a different warps, in different blocks, want to write into an adress in global memory? What the GPU gonna do? Serializes all the access to this address? Is there any guarantee of data consistence?
With Hyper-Q is possible to launch a lot of streams containing kernels. If I have a possition in the memory, and a number of threads in different kernels wants to write or read this address, what the GPU gonna do? Serializes the access of all threads from different kernels, or the GPU do nothing and some inconsistences gonna happen? Is there any guarantee of data consistence when multiple kernels are reading/writing into the same address?