I know that a Streaming Multiprocessor (SM) contains 32 cores in Tesla C2050, and a warp is composed of 32 threads. My question comes from this webpage: http://stackoverflow.com/questions/11564608/cuda-multiprocessors-warp-size-and-maximum-threads-per-block-what-is-the-exa
In the accepted answer, the asker asked:
“The threads in the same multiprocessor (warp) process the same line of the code and use shared memory of the current multiproccessor.”
and the answerer answered:
“No, there can be many more than 32 threads “in flight” at the same time in a single SM.”
My question is: since there are only 32 cores in one SM, one core can handle one thread at one time and there are 32 threads in a warp, it is obvious that one SM can execute only one warp at the same time. Why did the answerer say “there can be many more than 32 threads “in flight” at the same time in a single SM.”? If it is true, there should be many more than 32 cores in a single SM needed to run so many threads. Assume the threads contains only simple instructions, no load/store, no special functions like trigonometry. I am very confused. Please help me with it. Thanks a lot.
PS1: please don’t reply that I don’t need to know so many details about the GPU chip. I need to know it because I have to.
PS2: please don’t ask me if I have read some manuals or materials. I searched almost everywhere in the internet but nowhere can give me a clear correspondence between warp and core.
Thank you again for your answer.