lightweight thread


what is actually meant by “cuda threads are extremly light weight threads”. The only things that I found are, there is a small creation and switching overhead.
Can anyone explain this in more detail?


They mean that GPU can handle many threads in a very fast way as opposed to Windows OS for example, where each new thread you create introduces an overhead dealing with it.

I believe this statement is meant for people who are familiar with CPU threads to reassure them that creating thousands of GPU threads really can be a sensible thing to do.

On the technical side you are correct: There is zero overhead for switching between threads (each multiprocessor can schedule a different warp of threads for execution in each cycle), and kernel setup time appears to be almost independent of the thread number (i.e. the PCIe latencies involved are far greater than the time to setup another wave of blocks)…