GPU Context switch of multiple processes

kzhang28 · August 8, 2019, 6:41pm

I am looking into performance interference problem among co-running processes on a single GPU. So I need learn more about GPU context switch. This question is targeting the following scenario: two processes (e.g., two tensorflow object detection applications/processes running on a single GPU, Nvida Tesla P100 CUDA version 10.1 Driver Version 418.40.04 on Ubuntu16). I am NOT using nvidia multi-process service:[url]GPU Deployment and Management Documentation
My question is about how the cudaContexts of different processes are scheduled to execute on a single GPU (by leveraging time slicing as far as I know). Specifically:

(1) What is the scheduling policy?

I read some papers/documents. They mention the scheduling policy is FIFO: the cuda+driver maintain a single queue holding all pending kernel execution requests, as long as the kernel in front of the queue belongs to a different cudaContext than the current running cudaContext, a gpu context switch is invoked. Is this right?

(2) What are the “scheduling resources” (mentioned in 2.1.3 of this nvidia document:[url]GPU Deployment and Management Documentation) that need to be swapped off and on GPU during a context switch? In another word, where does the gpu context switch overhead come from?

(3)Will the gpu memory allocated to a process survive a context switch (won’t be swapped off GPU chip)? I guess the GPU memory allocated to a process will always be residing on GPU as long as the process is running.

I know the context switch mechanism of multiple processes on gpu is involved. I just want to know the principles, e.g., does the scheduling policy follow FIFO rules or some “fairness” rules.

Robert_Crovella · August 8, 2019, 7:34pm

cross-posting:

[url]https://stackoverflow.com/questions/57416664/nvidia-cuda-gpu-context-switch-among-multiple-processes[/url]

kzhang28 · August 8, 2019, 8:27pm

Hi Robert, I have also posted this question on Stackoverflow. Just want to reach out to more people in the community. This is my first post on the Nvidia forum. if cross-posting is forbidden in this forum I won’t do that again. Sorry…
As for Gpu process-level context switch, there is little relative document available. Could you shed light on this topic. Or could you point me to some useful links/materials?

Robert_Crovella · August 9, 2019, 3:14am

No problem with cross posting. I point it out occasionally because folks who are reading your question may also be interested in responses posted elsewhere.

Indeed there is little documentation available.

razrotenberg · April 1, 2020, 5:45am

Hi,
I’m also interested in knowing the answers to these questions, but the link to SO seems to be broken (maybe the SO article was removed).
Can you please re-answer these questions, or find a valid link to any other article answering them?

Thanks in advance.

xiazhiyi99 · February 19, 2021, 8:27am

Hi Robert!

The question asked by @kzhang28 is precisely what I manage to know. But this link seems broken. Sorry to bother, but could you explain a little more about these questions? I really wonder if the scheduling policy is FIFO to kernels from different streams(regardless of the stream priority).

Thanks,
xzy

Robert_Crovella · February 19, 2021, 1:25pm

The (now deleted) contents of the SO link is as follows:

Closed . This question needs to be more focused. It is not currently accepting answers.

Update the question so it focuses on one problem only. This will help others answer the question. You can edit the question.

Closed 1 year ago by talonmies, Vogel612, David Jaw Hpan.

(Viewable by the post author and users with the close/reopen votes privilege)

Edit question

This question is targeting the following scenario: two processes (e.g., two tensorflow object detection applications/processes running on a single GPU, Nvida Tesla P100 CUDA version 10.1 Driver Version 418.40.04 on Ubuntu16). I am NOT using nvidia multi-process service. My question is about how the cudaContexts of different processes are scheduled to execute on a single GPU (by leveraging time slicing as far as I know). Specifically:

What is the scheduling policy?

I read some papers/documents. They mention the scheduling policy is FIFO: the cuda+driver maintain a single queue holding all pending kernel execution requests, as long as the kernel in front of the queue belongs to a different cudaContext than the current running cudaContext, a gpu context switch is invoked. Is this right?

cuda gpu nvidia gpgpu[Edit tags](javascript:void(0))

Share

Edit

Follow

Reopen

Undelete

Flag

edited Aug 9 '19 at 17:55

asked Aug 8 '19 at 16:02

kz28

56144 silver badges2121 bronze badges

1

None of what you are asking about is documented, and it is very likely both operating system and hardware dependent. – talonmies Aug 8 '19 at 16:14

@talonmies I am working on Nvida Tesla P100 CUDA version 10.1 Driver Version 418.40.04. Even though the context switch mechanism is hardware dependent, are there any general principles about GPU context switch. For example, does the scheduling policy follow FIFO or some fairness rules. – kz28 Aug 8 '19 at 17:10
1

Again, not documented means not documented – talonmies Aug 8 '19 at 17:33

“Is this right?” – no, not necessarily. Very early CUDA driver implementations did work like that. Some platforms probably still do. But there is empirical evidence that some don’t. And for the third time – none of this is documented. It is all proprietary implementation details inside closed source drivers – talonmies Aug 9 '19 at 6:52
@talonmies, thank you. I see. – kz28 Aug 9 '19 at 13:30

Robert_Crovella · February 19, 2021, 1:30pm

When material is undocumented but discoverable through some means, such as code experimentation, code disassembly, etc. then in some cases I may be able to provide answers based on such discoverable attributes.

As far as I know, this information is not documented.

I don’t know of experiments or methods to discover this information. There may be such experiments, but I don’t know what they are.

In such situations, I am not at liberty to release non-public or non-discoverable or non-documented information about how CUDA behaves, of my own accord, with very limited exceptions.

So at this time I am not able to answer these questions. Further requests for me to answer these questions will not be responded to.

If you feel that CUDA documentation is lacking, one recourse you have is to file a bug using the information provided in a sticky post at the top of this sub forum, and request documentation to cover whatever topics you are interested in. There may be reasons why the CUDA developers choose not to document certain aspects of behavior.

xiazhiyi99 · February 24, 2021, 4:56am

Got it. Sorry for questing for unreleased information. Thanks for replying!

Topic		Replies	Views
Utilization of SMs in a GPU CUDA Programming and Performance	3	9352	July 4, 2010
CUDA multiple contexts CUDA Programming and Performance	0	5488	April 19, 2007
Multiple GPUs, multiple applications CUDA Programming and Performance	10	10014	April 22, 2009
How is the laptop GPU able to do the rendering and execute a cuda program at the same time CUDA Programming and Performance	6	750	August 15, 2023
Using CUDA/CudaContexts simultanously from multiple CPU threads CUDA Programming and Performance	4	5450	February 3, 2010
Program received signal CUDA_EXCEPTION_10, Device Illegal Address. CUDA-GDB	5	3243	March 3, 2017
GPU sharing among different application with different CUDA context CUDA Programming and Performance	23	18289	December 17, 2020
IDEA: Intrinsic multi-GPU support (Even over a network) CUDA Programming and Performance	7	9593	January 1, 2009
How to query device #s of available GPU devices? CUDA Programming and Performance	14	24322	May 5, 2009
Sharing the same Cuda context for encoding(NVENC) and decoding(NVDEC) Video Processing & Optical Flow	13	4335	January 12, 2020

GPU Context switch of multiple processes

Related topics