I’ve been working with CUDA Green Contexts for some time now, and after running several tests I have some questions regarding the execution time isolation and guarantees this technology offers.
My main goal is to ensure consistent performance for concurrent GPU workloads. As previously advised, I’m combining Green Contexts with MPS (Multi-Process Service) to achieve this.
I started with a control experiment where I launched 4 identical kernels in parallel using only MPS, without Green Contexts. The execution times were:
Kernel 1: 4939.87 ms
Kernel 2: 8519.13 ms
Kernel 3: 8951.64 ms
Kernel 4: 8949.66 ms
Then I launched a test where one of the kernels used a Green Context assigned 7 out of 8 SMs, and the remaining three were regular CUDA kernels. My expectation was that the kernel with GC would benefit from the isolation and dedicated resources, but the results were:
Kernel : 4486.31 ms
Kernel 2: 8962.64 ms
Kernel 3: 8953.19 ms
Kernel 4 with GC (7 SMs) : 8948.92 ms
As you can see, the execution times of the non-GC kernels remained nearly the same. I repeated this experiment using different SM allocations for the GC process, and noticed that only the GC kernel was affected—its runtime improved or worsened depending on the assigned SMs, but the others didn’t change at all.
So my question is:
What level of resource isolation and execution control does Green Contexts actually provide?
Shouldn’t it at least enable relative performance prioritization or isolation between GC and non-GC processes?
Additionally, I ran another test where each of the 4 kernels was launched in its own Green Context, thinking the issue might be related to task scheduling, possibly because GC tasks have different requirements or handling.
I assigned the resources as follows:
Kernel 1 → 1 SM
Kernel 2 → 5 SMs
Kernel 3 → 1 SM
Kernel 4 → 1 SM
The execution times were:
Kernel 1 (1 SM): 8959.69 ms
Kernel 2 (5 SMs): 11948.8 ms
Kernel 3 (1 SM): 11938.1 ms
Kernel 4 (1 SM): 17892.7 ms
To my surprise, all processes ran slower compared to when not using Green Contexts.
Is there any example or practical guide available that explains how to properly use this technology and better understand its actual behavior?
Thank you very much for your time, Curefab — I really appreciate it.
I agree with you that the issue might come from somewhere else. However, what really surprises me is that if, in theory, the GC kernel is using 7 out of 8 SMs during execution, I would expect the performance of the other concurrent kernels to be impacted — but that’s not what I’m observing.
I’ve also tried different SM assignments, and in every case, only the GC kernel’s execution time changed. The non-GC kernels stayed nearly the same, regardless of how many SMs were allocated to the GC context.
That’s why I’m asking if anyone has a practical example or benchmark where the benefits of Green Contexts are clearly visible — either in terms of performance isolation, prioritization, or resource control. Something that shows how assigning more SMs to a GC process improves execution time, or affects other workloads, compared to just running everything in parallel without isolation.
Any guidance or reference would be greatly appreciate
Hi juamaros, i m also new to green ctx, but i don’t quite understand why u mention all “processes“, if you wanna try different kernels assigned with different SMs, you should try it in the same process, right?
Hi Ryan,
Nice to see someone else working with this!
What I was asking in this post was related to whether I could use Green Contexts as an alternative to MIG, which is a GPU resource partitioning technology that unfortunately isn’t compatible with Jetson.
I also discovered that part of my issue comes from my GPU architecture. As mentioned in NVIDIA’s documentation on the topic (I’ll attach the link), for my GPU architecture — in this case, architecture 8 — I can only create GC groups starting from 4 SMs, and they must be multiples of two.
“On Compute Architecture 8.X: The minimum count is 4 SMs and must be a multiple of 2.”
Not sure if that helps clarify things a bit.
Have a great day!
Thanks for your clarification! I also have the exactly same question with you: “can green ctx be the software alternative to MIG“. From your experiment, it seems it can’t unfortunately.
BTW, do you know any other document, tutorial slide, or personal blog except for that CUDA api doc, as i m trying to figure out how they implement green ctx.
Have a great day!
You’re welcome, we’re here to help each other. Unfortunately, at least in my search, I haven’t found anything. Right now, I’m working on an article about this. I can provide you with this list in case it helps. However, there’s little additional information I can direct you to. Sorry. Have a nice day.