Question about interoperability of CUDA Graphs Green Context across multiple processes

juamaros · May 13, 2025, 8:45am

Hi,

I’ve been working with the CUDA Graphs Green Context technology for a while, and a few questions have come up that I haven’t been able to resolve through the available documentation.

Most of the examples I’ve seen use Green Context within a single process — typically creating multiple Green Contexts and launching them concurrently in that same process.
[link here]

However, I wonder if this technology is also compatible across different processes. Specifically:

Can I launch two separate processes in parallel, each creating and using its own Green Context independently?
Furthermore, is it possible to run processes using Green Context in parallel with others that launch traditional CUDA kernels (i.e., not using Green Context)?

Thank you very much for your time and support!

Best regards,

Robert_Crovella · May 14, 2025, 12:27am

I don’t think there should be any issue with having two separate processes, both using Green Context, and/or one process using Green Context and one not. CUDA GPUs have basic process isolation as well as context switching between processes. To a first order approximation, what process A does should not materially impact what process B can do, with a couple obvious exceptions:

Performance/throughput
amount of device memory that can be allocated

juamaros · May 14, 2025, 8:55am

Hi, first of all, thank you for your time.

The issue I’m encountering when trying to launch processes in parallel — one using Green Context and the other not — is that they don’t seem to execute concurrently on the GPU. Theoretically, this shouldn’t happen, right? Or is this what you’re referring to as the effect on performance/throughput?

I’m attaching an image from an NVIDIA Nsight Systems report. In this case, both processes are launching the same kernel, with the only difference being that one uses Green Context (with 90% of the GPU resources assigned) and the other doesn’t.

What I don’t understand is why — for every full execution of the kernel in the Green Context — the kernel without reserved resources gets scheduled and executed multiple times in between, even though they’re running the exact same workload.

Robert_Crovella · May 14, 2025, 1:08pm

generally, separate processes don’t execute concurrently on the GPU. That is true whether we are talking about green contexts or not.

When multiple processes are launched on a single GPU without MPS, the GPU will context-switch between the processes. That means at any given instant, when a kernel belonging to one process is executing, kernels belonging to other processes cannot/will not be executing.

There is a lot of unpublished detail here. The exact context-switching behavior can vary in terms of when the context-switching happens, but the statement I made is still correct; at any given instant, only kernel(s) from one process can execute. Modern inter-process context-switching on modern GPUs typically follows a time-sliced behavior in my experience, but observations may vary and I don’t know for sure if there is any context-switching nuance in the Jetson case. But given your relatively “long” kernel durations here (10’s of seconds, it seems) then it certainly seems to me that inter-process context switching is happening on a time-slicing basis, which gives the “illusion” that both kernels are executing “simultaneously”.

Looking at your profiler output, it seems evident that the heavyKernel duration in the green context case is substantially longer (around perhaps 22s in duration) than the duration in the non-green-context case (perhaps around 11 seconds?) That should be the proximal explanation for the difference in throughput. Perhaps you haven’t assigned 90% of resources as claimed, or perhaps green context is slowing the kernel execution down for some other reason. I guess if I were studying this, I would first start by studying kernel duration for each process independently (i.e. running only one process at a time). If they exist still in a 2:1 ratio, then there is no reason to assume multi-process behavior has anything to do with this.

system · May 28, 2025, 1:09pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How is the laptop GPU able to do the rendering and execute a cuda program at the same time CUDA Programming and Performance	6	757	August 15, 2023
Concurrent execution of more than one CUDA application CUDA Programming and Performance	5	2993	May 1, 2009
How to Launch Cuda kernel in different processes CUDA Programming and Performance	8	3759	November 6, 2018
Concurrent Kernel executions Concurrent Kernel executions on same CPU thread and multiple CPU threa CUDA Programming and Performance	2	4170	August 25, 2011
Can I utilize Concurrent Kernel Execution among processes with the same context? CUDA Programming and Performance	0	508	December 9, 2016
Concurrent execution of kernels from different contexts CUDA Programming and Performance	1	5998	July 9, 2010
Multiple Context? CUDA Programming and Performance	0	580	July 30, 2018
CUDA multiple contexts CUDA Programming and Performance	0	5492	April 19, 2007
GPU sharing among different application with different CUDA context CUDA Programming and Performance	23	18321	December 17, 2020
Run CUDA and OpenCL kernels simultaneously CUDA Programming and Performance	1	1383	February 7, 2017

Question about interoperability of CUDA Graphs Green Context across multiple processes

Related topics