Can global-memory accesses by threads within WARP cause divergence within threads among the warps?
Case 1:
All threads in WARP are accessing consecutive global mem locations that can be coalesced.
Case 2:
Threads in WARP are accessing non-continuous global memory locations. It is assumed that the WARP itself is NOT diverged yet.
In Case 1, it is possible that the first HALF-WARP data is available before the second. So, will this cause WARP divergence as such? I would assume that the WARP scheduler would wait for all data to be available for a WARP before it can schedule the WARP. Because this condition cannot be “predicated” and I am assuming that the instruction unit does NOT have intellgence to schedule different instructions to threads belonging to the same WARP.
In Case 2, obviously certain data elements would be available before certain others. So, will this cause WARP divergence?
Thanks for your help in advance