Run multiple applications simultaneusly on Fermi

I am trying to figure out how Tesla c2050 handles multiple applications accessing the GPU resource.

From the Fermi whitepaper (page 18) it seems context switching between applications is possible. So I run an experiment to confirm that.

I tried running the matrixMul program from the toolkit, modified so that the kernel launch runs for a while. Then I run another program (e.g. scalarProd) while matrixMul is busy in the kernel execution.
Surprisingly scalarProd stops saying: : cudaSafeCall() Runtime API error 46: all CUDA-capable devices are busy or unavailable.
Apparently there is no context switch nor waiting for resources to be ready.

What is the meaning of context switching and efficient multitasking that the white paper refers to? Can anybody help me understanding how multiple applications are handled?

Thanks for your help.

any update?

Fermi can run concurrent kernels from the same context.
If you are using the same GPU from multiple applications, the driver will serialize the access.
Once a kernel starts, it will run to completion.


I routinely run several apps per CUDA device at once, either Fermi and pre-Fermi. Are you sure your device is not in compute-exclusive mode?

Also, all the apps concurrently running on a device must fit within the device’s RAM. Cudamalloc will fail unless there’s enough memory, but your error message doesn’t seem to indicate this.

Good luck!

Thanks for your reply.

My c2050 is set in compute mode 0, therefore not in compute-exclusive mode. Also, my 2 apps (taken from SDK) use very little memory, so this should not be the problem.

Thanks for your help.

sorry to bother, could you please provide me with a simple scenario where you see two apps executing their kernels in parallel (i.e. KernelA from AppA running at the same time as KernelB from AppB)?

I tried even with the simplest example but it does not work on my c2050. Changing OS, driver mode or compute mode does not help.

They will not. They can time share the GPU but not run at the same time.

how do I make them time share? According to my experiment one kernel gets denied access to the GPU.

Thanks for your help

any news from NVIDIA on this?

Sorry mfactia, but you are misdirecting the OP. GPUs only context switch at the boundaries between kernel executions. Two kernels from diferents context cannot execute (as cuda 4.0) concurrently, and while a long kernel KA execute from app A, never a kernel KB from app B will execute before KA finish.