I have two questions regarding CUDA 4.0 concurrent kernels(on devices with compute capability is 2.0). I am looking for some detailed explanation which clarifies these concepts
When multiple threads execute different computation kernels at the same time on the same device - are those executed one after another or simultaneously in reality?
What is the exact difference between these two scenarios
- Call 2 different kernels from two different threads on same device
- Call 2 kernel from same thread on same device