How the access of the same global memory address is performed by threads from different kernels?

I have this question:

If many threads in a warp want to read the same adress in global memory, this data is broadcasted, is that right?

If many threads in a warp want to write into an adress in global memory, there is a serialization, but is not possible to predict the order, is that right? All threads gonna write?

But, the main question: If many threads in a different warps, in different blocks, want to write into an adress in global memory? What the GPU gonna do? Serializes all the access to this address? Is there any guarantee of data consistence?

With Hyper-Q is possible to launch a lot of streams containing kernels. If I have a possition in the memory, and a number of threads in different kernels wants to write or read this address, what the GPU gonna do? Serializes the access of all threads from different kernels, or the GPU do nothing and some inconsistences gonna happen? Is there any guarantee of data consistence when multiple kernels are reading/writing into the same address?

There is no guarantee of the order that blocks will execute in, so the result will be whichever thread in whichever warp and block writes to it last.

Thanks, KBam.

I found good answers here: cuda - How the access of the same global memory address is performed by threads from different kernels? - Stack Overflow