CUDA 4.0 concurrent kernels

pkgind · March 25, 2011, 6:22am

Hi,

I have two questions regarding CUDA 4.0 concurrent kernels(on devices with compute capability is 2.0). I am looking for some detailed explanation which clarifies these concepts

When multiple threads execute different computation kernels at the same time on the same device - are those executed one after another or simultaneously in reality?
What is the exact difference between these two scenarios
1. Call 2 different kernels from two different threads on same device
2. Call 2 kernel from same thread on same device

sergeyn · March 25, 2011, 7:07am

As far as you use different streams to submit kernels, you should be fine, and it does not matter how many threads you use to submit the work.

pkgind · March 25, 2011, 12:11pm

Still, I am not very clear what you are trying to explain. Can you please tell me in detail.

sergeyn · March 25, 2011, 12:47pm

to get kernels running concurrently you need 2 things: the hardware, and your kernels submitted to different streams (i.e - different stream handle as the last parameter of cudaKernel<<<>>> specification). It does not matter if you populate your streams with kernels using many threads, or just one thread (like it is done here) - they all should get executed concurrently in any case.
See programing guide paragraph 3.2.5.3 and 3.2.5.5

pkgind · March 25, 2011, 6:54pm

Thanks for reply. I agree to the point if you have different stream ids then it does not matter that kernel is launched by one thread or multiple. But, what I think these kernels can not run cuncurrently. What I mean here is that at one time you will have blocks of kernel distributed over different SMs. If they can run cuncurrently, so you mean some SM will run blocks from kernel 1 and another from kernel 2 and so on … I think this should not be the case …
What is your opinion in this regard ? And if this is not the case, then any idea how the execution happens ?

sergeyn · March 26, 2011, 8:36am

The resource usage of your kernels (registers, shared memory) should not be too demanding to allow your kernels to run concurrently. And I don’t know if kernels using different function code can be overlapped or not. The best way to find precise answers to your questions is to do a benchmark test, where you could try out different combinations - 2 same kernels, 2 different kernels but from same module, 2 kernels from different modules and whatever else scenario you can think of.

pkgind · March 28, 2011, 4:06am

Thanks sergeyn! It seems worth trying out the combinations as you have suggested.

Topic		Replies	Views
A question on concurrent kernel execution CUDA Programming and Performance	2	777	April 13, 2012
Can it occur that 2 kernels run at the same time if the 2 kernels are continuously launched? CUDA Programming and Performance	2	392	January 8, 2019
Concurrent Kernel Execution CUDA Programming and Performance	2	4527	June 10, 2011
Distinct Kernels on Concurrent Streams? CUDA Programming and Performance	3	1210	June 9, 2009
Concurrent kernel execution CUDA Programming and Performance	2	259	March 26, 2024
Multiple simultaneous kernels across different streams CUDA Programming and Performance	3	4536	February 3, 2009
concurrent kernels call on diffrent cpu threads CUDA Programming and Performance	4	3042	July 21, 2009
Run CUDA and OpenCL kernels simultaneously CUDA Programming and Performance	1	1373	February 7, 2017
CUDA processor allocation CUDA Programming and Performance	7	3434	October 5, 2007
Multiple host thread on a single GPU CUDA Programming and Performance	2	5188	February 10, 2012

CUDA 4.0 concurrent kernels

Related topics