Concurrent execution of more than one CUDA application

Does CUDA and the CUDA enabled GPUs have capability to store the context of an application running.
Suppose I have more than one (CUDA)application running on my host, can CUDA switch between the contexts
of the two applications automatically?


Of course. You can run as many CUDA processes you want on a single GPU simultaneously, as long as there is enough GPU memory to do so.

What I intended to ask was that suppose there are two CUDA applications running - to be more precise each of these CUDA app has a different “kernel” code. Will the driver submit each of these kernels serially ( i.e. after kernel 1 finishes, only then submit the next job to GPU) OR can the two kernel operations be interleaved, like on a convetional CPU. In the latter case, when the GPU switches between the kernels - it needs to store the context and the state of the kernel1. Is this possible? If yes, will the host code have access to this context?


The GPU will switch between different contexts only after a kernel invocation has completed.

I have a follow on question. Since it appears you can have two threads running concurrently (albeit one executed at a time) on the GPU via different host threads and CUDA contexts, is it possible to have two CUDA contexts running in parallel. For example one context using 1/2 of the chip and one using the other half? Obviously these would be entirely independent computations.

No, it is not possible. Each kernel utilizes the entire GPU