Memory latency for divergent warp execution

Memory latency for divergent warp execution is shorter.
Is it due to smaller memory transaction size or there is other factor ? (Compute versions 1.3 …)
Thanks alot

I don’t know how clever the scheduler is and exactly how Nvidia implemented divergent threads, but there are more possibilities to hide latencies once you have taken the hit of divergent threads. Also, data fetched in one branch may be available to other branches from caches or buffers.