I’ve been working with CUDA Green Contexts for some time now, and after running several tests I have some questions regarding the execution time isolation and guarantees this technology offers.
My main goal is to ensure consistent performance for concurrent GPU workloads. As previously advised, I’m combining Green Contexts with MPS (Multi-Process Service) to achieve this.
I started with a control experiment where I launched 4 identical kernels in parallel using only MPS, without Green Contexts. The execution times were:
Kernel 1: 4939.87 ms
Kernel 2: 8519.13 ms
Kernel 3: 8951.64 ms
Kernel 4: 8949.66 ms
Then I launched a test where one of the kernels used a Green Context assigned 7 out of 8 SMs, and the remaining three were regular CUDA kernels. My expectation was that the kernel with GC would benefit from the isolation and dedicated resources, but the results were:
Kernel : 4486.31 ms
Kernel 2: 8962.64 ms
Kernel 3: 8953.19 ms
Kernel 4 with GC (7 SMs) : 8948.92 ms
As you can see, the execution times of the non-GC kernels remained nearly the same. I repeated this experiment using different SM allocations for the GC process, and noticed that only the GC kernel was affected—its runtime improved or worsened depending on the assigned SMs, but the others didn’t change at all.
So my question is:
What level of resource isolation and execution control does Green Contexts actually provide?
Shouldn’t it at least enable relative performance prioritization or isolation between GC and non-GC processes?
Additionally, I ran another test where each of the 4 kernels was launched in its own Green Context, thinking the issue might be related to task scheduling, possibly because GC tasks have different requirements or handling.
I assigned the resources as follows:
Kernel 1 → 1 SM
Kernel 2 → 5 SMs
Kernel 3 → 1 SM
Kernel 4 → 1 SM
The execution times were:
Kernel 1 (1 SM): 8959.69 ms
Kernel 2 (5 SMs): 11948.8 ms
Kernel 3 (1 SM): 11938.1 ms
Kernel 4 (1 SM): 17892.7 ms
To my surprise, all processes ran slower compared to when not using Green Contexts.
Is there any example or practical guide available that explains how to properly use this technology and better understand its actual behavior?
Thank you very much for your time, Curefab — I really appreciate it.
I agree with you that the issue might come from somewhere else. However, what really surprises me is that if, in theory, the GC kernel is using 7 out of 8 SMs during execution, I would expect the performance of the other concurrent kernels to be impacted — but that’s not what I’m observing.
I’ve also tried different SM assignments, and in every case, only the GC kernel’s execution time changed. The non-GC kernels stayed nearly the same, regardless of how many SMs were allocated to the GC context.
That’s why I’m asking if anyone has a practical example or benchmark where the benefits of Green Contexts are clearly visible — either in terms of performance isolation, prioritization, or resource control. Something that shows how assigning more SMs to a GC process improves execution time, or affects other workloads, compared to just running everything in parallel without isolation.
Any guidance or reference would be greatly appreciate