Questions about Resource Isolation and Execution Control using CUDA Green Contexts + MPS

juamaros · May 26, 2025, 8:23am

Hi,

I’ve been working with CUDA Green Contexts for some time now, and after running several tests I have some questions regarding the execution time isolation and guarantees this technology offers.

My main goal is to ensure consistent performance for concurrent GPU workloads. As previously advised, I’m combining Green Contexts with MPS (Multi-Process Service) to achieve this.

I started with a control experiment where I launched 4 identical kernels in parallel using only MPS, without Green Contexts. The execution times were:

Kernel 1: 4939.87 ms
Kernel 2: 8519.13 ms
Kernel 3: 8951.64 ms
Kernel 4: 8949.66 ms

Then I launched a test where one of the kernels used a Green Context assigned 7 out of 8 SMs, and the remaining three were regular CUDA kernels. My expectation was that the kernel with GC would benefit from the isolation and dedicated resources, but the results were:

Kernel : 4486.31 ms
Kernel 2: 8962.64 ms
Kernel 3: 8953.19 ms
Kernel 4 with GC (7 SMs) : 8948.92 ms

As you can see, the execution times of the non-GC kernels remained nearly the same. I repeated this experiment using different SM allocations for the GC process, and noticed that only the GC kernel was affected—its runtime improved or worsened depending on the assigned SMs, but the others didn’t change at all.

So my question is:
What level of resource isolation and execution control does Green Contexts actually provide?
Shouldn’t it at least enable relative performance prioritization or isolation between GC and non-GC processes?

Thanks in advance.

juamaros · May 26, 2025, 8:30am

Additionally, I ran another test where each of the 4 kernels was launched in its own Green Context, thinking the issue might be related to task scheduling, possibly because GC tasks have different requirements or handling.

I assigned the resources as follows:

```
Kernel 1 → 1 SM
```
```
Kernel 2 → 5 SMs
```
```
Kernel 3 → 1 SM
```
```
Kernel 4 → 1 SM
```

The execution times were:

Kernel 1 (1 SM): 8959.69 ms
Kernel 2 (5 SMs): 11948.8 ms
Kernel 3 (1 SM): 11938.1 ms
Kernel 4 (1 SM): 17892.7 ms

To my surprise, all processes ran slower compared to when not using Green Contexts.

Is there any example or practical guide available that explains how to properly use this technology and better understand its actual behavior?

Curefab · May 26, 2025, 12:33pm

Even the kernel with Green Context stayed the same. Perhaps the kernel is not bound by number of SMs.

juamaros · May 26, 2025, 1:42pm

Thank you very much for your time, Curefab — I really appreciate it.

I agree with you that the issue might come from somewhere else. However, what really surprises me is that if, in theory, the GC kernel is using 7 out of 8 SMs during execution, I would expect the performance of the other concurrent kernels to be impacted — but that’s not what I’m observing.

I’ve also tried different SM assignments, and in every case, only the GC kernel’s execution time changed. The non-GC kernels stayed nearly the same, regardless of how many SMs were allocated to the GC context.

That’s why I’m asking if anyone has a practical example or benchmark where the benefits of Green Contexts are clearly visible — either in terms of performance isolation, prioritization, or resource control. Something that shows how assigning more SMs to a GC process improves execution time, or affects other workloads, compared to just running everything in parallel without isolation.

Any guidance or reference would be greatly appreciate

Topic		Replies	Views
Interaction between Green Contexts, MPS, and GPU resource allocation for parallel kernel execution CUDA Programming and Performance cuda	0	15	June 3, 2025
Question about interoperability of CUDA Graphs Green Context across multiple processes CUDA Programming and Performance cuda	4	53	May 14, 2025
Green-context-sm-allocation-not-affecting-kernel-runtime in Jetson Orina Jetson Orin Nano cuda	8	42	May 16, 2025
Concurrent Kernel Execution and Context switching Problem CUDA Programming and Performance	11	8285	July 8, 2015
GPU sharing among different application with different CUDA context CUDA Programming and Performance	23	18366	December 17, 2020
Green Context SM Allocation Not Affecting Kernel Runtime CUDA Programming and Performance cuda , jetson	9	92	May 6, 2025
cuda kernels from different process can run concurrently? same performance with MPS on and off? CUDA Programming and Performance	9	2111	May 3, 2018
Low processor efficiency with almost same CUDA kernels CUDA Programming and Performance	4	684	April 9, 2018
CUDA won't concurrently run kernels on multiple devices from within same process CUDA Programming and Performance cuda , performance , gpu	1	1139	January 27, 2023
Kernel execution time increase 4x when using streams CUDA Programming and Performance	8	1697	August 13, 2015

Questions about Resource Isolation and Execution Control using CUDA Green Contexts + MPS

Related topics