Utilization of SMs in a GPU

mirkel · July 4, 2010, 12:02am

I am a CUDA newbie and looking into running multiple applications together on a GPU.
From the readings what I understand is NVIDIA does not advice running multiple applications
on the GPU (although CUDA driver manages to run them) since
a ) Applications do not have knowledge about each other’s context
b ) Architectures upto G280 do not support concurrent kernel / (application ) execution

Taking this into consideration , what happens if few Streaming Multiprocessors are not being
used if an application is already running on the GPU? Can we detect this condition and run other applications on those idle SMs?
(Although did not see this feature in the CUDA programming model)

seibert · July 4, 2010, 1:04am

Actually, running multiple applications has worked for a long time on every CUDA compute capability. Note that all CUDA GPUs, including Fermi, interleave execution of kernels from different contexts. At no time do kernels from different contexts (which includes different applications) run on the device simultaneously. When you run a GUI on your GPU, that also is treated as a CUDA context of sorts, which is why a long running kernel can make your display appear to freeze. Every context gets total control of all SMs when the context is active.

The reasons NVIDIA discourage multiple applications using the same GPU include:

Buggy drivers in the past could potentially cause crashes during frequent GPU context switching. This has been resolved, as far as I know.
In a multi-user system (a common use case for multi-application execution), there is no graceful degradation as users exhaust the GPU memory. If one user takes all the device memory, the second user will just see their application abort until the first user’s application exits or they free the memory.
The overhead of context switching means that multiple applications will see lower total performance than just one application, a lot lower if each application is executing very short kernels. This is one place where Fermi is supposed to have improved, but the overhead is still not zero.

Note that concurrent kernel execution (as NVIDIA has defined it) is not related to concurrent application execution. Concurrent kernel execution on Fermi allows kernels from different CUDA streams in the same CUDA context to execute simultaneously, getting varying amounts of SMs depending on the block scheduler. Different CUDA contexts running on the same device still undergo a full context switch between kernel executions, just as before.

mirkel · July 4, 2010, 1:42pm

Actually, running multiple applications has worked for a long time on every CUDA compute capability. Note that all CUDA GPUs, including Fermi, interleave execution of kernels from different contexts. At no time do kernels from different contexts (which includes different applications) run on the device simultaneously. When you run a GUI on your GPU, that also is treated as a CUDA context of sorts, which is why a long running kernel can make your display appear to freeze. Every context gets total control of all SMs when the context is active.

The reasons NVIDIA discourage multiple applications using the same GPU include:

Buggy drivers in the past could potentially cause crashes during frequent GPU context switching. This has been resolved, as far as I know.

In a multi-user system (a common use case for multi-application execution), there is no graceful degradation as users exhaust the GPU memory. If one user takes all the device memory, the second user will just see their application abort until the first user’s application exits or they free the memory.

The overhead of context switching means that multiple applications will see lower total performance than just one application, a lot lower if each application is executing very short kernels. This is one place where Fermi is supposed to have improved, but the overhead is still not zero.

Note that concurrent kernel execution (as NVIDIA has defined it) is not related to concurrent application execution. Concurrent kernel execution on Fermi allows kernels from different CUDA streams in the same CUDA context to execute simultaneously, getting varying amounts of SMs depending on the block scheduler. Different CUDA contexts running on the same device still undergo a full context switch between kernel executions, just as before.

Thanks seibert! this is quite helpful :). Is there some link / documentation from which I can read more about the GPU context switching and also about the second point you mentioned (about application abort because it does not find enough GPU memory )?

seibert · July 4, 2010, 3:18pm

I’m not sure about the context switching discussion. The CUDA programming guide is the standard reference for most things. Another option is to write a small benchmark yourself and see what happens when you run multiple programs on the same GPU.

As for the abort, I wasn’t very precise. What will actually happen is cudaMalloc will return an error, and then you will have to decide how to handle the out of device memory condition. In my applications, if there is no device memory, the program can’t run, so I abort.

Topic		Replies	Views
Invoking kernel from multiple PC processes CUDA Programming and Performance	1	5501	June 3, 2011
putting multiprocessors in group CUDA Programming and Performance	6	1677	November 27, 2009
Run multiple applications simultaneusly on Fermi CUDA Programming and Performance	9	2330	November 12, 2011
Kernel scheduling with Fermi independent blocks can be placed in new streams? CUDA Programming and Performance	14	13202	January 22, 2010
Multiple GPUs, multiple applications CUDA Programming and Performance	10	10007	April 22, 2009
cuda with multicore (multitasking) multicore CPU(for multitasking) and CUDA CUDA Programming and Performance	13	12028	February 23, 2009
Sharing a GPU server for CUDA programming in a multi-user operating system CUDA Programming and Performance	4	18365	January 3, 2019
How is the laptop GPU able to do the rendering and execute a cuda program at the same time CUDA Programming and Performance	6	734	August 15, 2023
Scheduling multiple applications on a single GPU CUDA Programming and Performance	1	981	October 21, 2011
Multiple kernels in flight? CUDA Programming and Performance	19	26832	August 28, 2007

Utilization of SMs in a GPU

Related topics