SP and Warp

I have a question:

  1. Does streaming processor handle a warp at a time or many ?

Yhm… Your SP shortcut is misleading. In the CUDA manual SP stands for ‘Scalar Processor’ and it handles 4 threads concurrently in parallel at a time (some hyperthreading or something… whatever). You may assume they are executed in parallel on one SP.
On pre-Fermi architectures Stream Multiprocessor (SM) (the thingy that runs your block) consists of 8 SPs and handles 32 threads at a time – a single warp (hope that answers your question), however the scheduler can swap warps which are currently being run to hide various latencies.
On Fermi architecture, Stream Multiprocessor consists of 16 SPs and handles 64 threads at a time – 2 warps. If I recall correctly, they are constraint in a way that 8 SPs execute only odd warps and other 8 SPs execute only even warps.

thanks for your reply, I understand more about that.

Actually on Fermi, each SM has 32 SPs. But yes, they’re split between two schedulers.