about SP inside an SM Want to know in detail about behaviour of SP

What are the charactersitics about SP’s

Are they Sequential, how many pipeline stages they have. How they handle Warps. And any info that people have learnt or know about SP’s would be really useful. I am a hardware guy writing CUDA programs, so if i dont understand the hardware correctly i think my head will explode :)

Thanks for taking the time to review my question

you can read David Kanter’s article at