GPU sharing among different application with different CUDA context

Hi, Thanks a lot for explaining in such a detail. This is really helpful.

May be I was not very clear in drafting my question but I think you still have explained the concept and my doubt clearly.

What I understand now is that if the two applications with 1) kernel K1, CUDA context C1 and 2) kernel K2 with CUDA context C2 are running and even a single run of any of the kernel K1 or K2 takes say 10 minutes to complete the processing on GPU. Then say if kernel K1 is running then the other kernel K2 cannot be launched in parallel to say by partitioning the GPU resources in exclusive manner. Also it is still run to completion mode that is in that 10 minute of of run of K1, it cannot be preempted for time multiplexing for example 1 second of K1 then context switching and scheduling of K2 and so on just like CPU based general multitasking operating systems.

In summary i wanted to know that it is no pre-emptive scheduling and no parallel processing of different kernels at same time by partitioning GPU resources. Rather it is cooperating scheduling and I have to design my application in such a way that any kernel runs in some defined time and finishes up so the other waiting kernels can be scheduled by CUDA driver.

Do you think the approaches for concurrency are yet to come?

Regards
Deepak