Half-warp divergence and global memory access! Speedups

Can global-memory accesses by threads within WARP cause divergence within threads among the warps?

Case 1:
All threads in WARP are accessing consecutive global mem locations that can be coalesced.

Case 2:
Threads in WARP are accessing non-continuous global memory locations. It is assumed that the WARP itself is NOT diverged yet.

In Case 1, it is possible that the first HALF-WARP data is available before the second. So, will this cause WARP divergence as such? I would assume that the WARP scheduler would wait for all data to be available for a WARP before it can schedule the WARP. Because this condition cannot be “predicated” and I am assuming that the instruction unit does NOT have intellgence to schedule different instructions to threads belonging to the same WARP.

In Case 2, obviously certain data elements would be available before certain others. So, will this cause WARP divergence?

Thanks for your help in advance

I believe these cases won’t introduce divergence. I think global memory reads cause implicit syncronization, so that execution continues only when data requested by all threads in a WARP is received.

Thanks for the reply. Lets hope so…

As far as i understand the architecture, there can’t be a real divergence within a half warp, since they a physically run on the same gpu at the same time.

if im wrong please correct me