I am an experienced programmer ( with aerospace background) but I am very new to CUDA. After going through the programming guide I was able to get most of the stuff, but the concept of “half wrap” wasn’t clear to me.
Please can somebody explain what purpose does it serve and how should that be taken into account for memory optimization ?
NVIDIA likes to confuse its readers. Thats the only inference I could make until some1 told me this explanation.
"
A WARP is a bunch of 32 threads with consecutive threadIDs starting from multiples of 32 (0,32,64…). There are 8 CPUs in a multi-processor. These 8 CPUs have to execute the same instruction at the same time for all 32 threads in a warp.
Now, consider an instruction that loads from shared memory.
Now, this instruction has to be executed by 32 threads.
Now, there are only 16 shared memory banks. So, there are bound to be redundancy. Thus first 16 threads have to execute first and the rest 16 have to wait for the 16 to complete.
OR
something like that…
The manual also talks about half-warps during coalescing global memory as well… So, it could very well be a specific hardware feature that NVIDIA does NOT want to divulge.
"
Whatever it might be, you can safely ignore it and work with CUDA… May b, some1 enlightened could give you a better answer
Anyway, a warp is the chunk of threads that run in lockstep SIMD fashion. There are 32 threads in a warp with current hardware. Threads within a warp all execute the same instruction, but the results can be masked, allowing divergence within a warp, at the cost of each divergent path being executed serially. A half warp is half the threads of a warp, corresponding to the first or last 16 threads of a warp. The reason that the half warp size is relevant is due to hardware scheduling concerns.
Just follow the rules for memory coalescing and avoiding bank conflicts as spelled out in the programming guide. It recommends coalescing at the full warp level instead of the half-warp because future hardware might work that way.