I guess this will turn out to be obvious, but ive been staring at it for 20 mins and for now i dont see it.
Where are the bank conflicts in the parallel reduction kernel 2 (interleaved adressing)?
Threads 8-15 of the half warp are sleeping, and threads 0-7 each read 2 items, residing in two adjacent banks. As far as i can tell, each of the bank is then read by only 1 thread of a half warp.
Point me to the obvious!