Warp execution

Hi all

I would appreciate you throw light upon warp execution details.

Qute from programming guide:

However, warp size in GeForce 8800 GTX is 32. And number of ALU in multiprocessor is 8 AFAIK (correct me pls if I am not right). Therefore all threads from the warp can not be executed simultaneously. Are they executed by 1/4 warp portions? If yes instruction from another warp can not be executed before all 1/4 of executing one have not finished current instruction.

Thank you.

Yes, essentially a 32-thread warp is executed by 4 passes through the 8 processors of a given multiprocessor.

Thank you, Simon.

I am glad that my understanding was correct :)

And is the following statement right: