GPU Context switch of multiple processes

I am looking into performance interference problem among co-running processes on a single GPU. So I need learn more about GPU context switch. This question is targeting the following scenario: two processes (e.g., two tensorflow object detection applications/processes running on a single GPU, Nvida Tesla P100 CUDA version 10.1 Driver Version 418.40.04 on Ubuntu16). I am NOT using nvidia multi-process service:https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf
My question is about how the cudaContexts of different processes are scheduled to execute on a single GPU (by leveraging time slicing as far as I know). Specifically:

(1) What is the scheduling policy?

I read some papers/documents. They mention the scheduling policy is FIFO: the cuda+driver maintain a single queue holding all pending kernel execution requests, as long as the kernel in front of the queue belongs to a different cudaContext than the current running cudaContext, a gpu context switch is invoked. Is this right?

(2) What are the “scheduling resources” (mentioned in 2.1.3 of this nvidia document:https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf) that need to be swapped off and on GPU during a context switch? In another word, where does the gpu context switch overhead come from?

(3)Will the gpu memory allocated to a process survive a context switch (won’t be swapped off GPU chip)? I guess the GPU memory allocated to a process will always be residing on GPU as long as the process is running.

I know the context switch mechanism of multiple processes on gpu is involved. I just want to know the principles, e.g., does the scheduling policy follow FIFO rules or some “fairness” rules.

1 Like

cross-posting:

https://stackoverflow.com/questions/57416664/nvidia-cuda-gpu-context-switch-among-multiple-processes

Hi Robert, I have also posted this question on Stackoverflow. Just want to reach out to more people in the community. This is my first post on the Nvidia forum. if cross-posting is forbidden in this forum I won’t do that again. Sorry…
As for Gpu process-level context switch, there is little relative document available. Could you shed light on this topic. Or could you point me to some useful links/materials?

No problem with cross posting. I point it out occasionally because folks who are reading your question may also be interested in responses posted elsewhere.

Indeed there is little documentation available.

Hi,
I’m also interested in knowing the answers to these questions, but the link to SO seems to be broken (maybe the SO article was removed).
Can you please re-answer these questions, or find a valid link to any other article answering them?

Thanks in advance.

Hi Robert!

The question asked by @kzhang28 is precisely what I manage to know. But this link seems broken. Sorry to bother, but could you explain a little more about these questions? I really wonder if the scheduling policy is FIFO to kernels from different streams(regardless of the stream priority).

Thanks,
xzy

The (now deleted) contents of the SO link is as follows:

Closed . This question needs to be more focused. It is not currently accepting answers.

Update the question so it focuses on one problem only. This will help others answer the question. You can edit the question.

Closed 1 year ago by talonmies, Vogel612, David Jaw Hpan.

(Viewable by the post author and users with the close/reopen votes privilege)

Edit question

This question is targeting the following scenario: two processes (e.g., two tensorflow object detection applications/processes running on a single GPU, Nvida Tesla P100 CUDA version 10.1 Driver Version 418.40.04 on Ubuntu16). I am NOT using nvidia multi-process service. My question is about how the cudaContexts of different processes are scheduled to execute on a single GPU (by leveraging time slicing as far as I know). Specifically:

What is the scheduling policy?

I read some papers/documents. They mention the scheduling policy is FIFO: the cuda+driver maintain a single queue holding all pending kernel execution requests, as long as the kernel in front of the queue belongs to a different cudaContext than the current running cudaContext, a gpu context switch is invoked. Is this right?

cudagpunvidiagpgpu[Edit tags](javascript:void(0))

Share

Edit

Follow

Reopen

Undelete

Flag

edited Aug 9 '19 at 17:55

asked Aug 8 '19 at 16:02



kz28

56144 silver badges2121 bronze badges

  • 1

None of what you are asking about is documented, and it is very likely both operating system and hardware dependent. – talonmies Aug 8 '19 at 16:14

  • @talonmies I am working on Nvida Tesla P100 CUDA version 10.1 Driver Version 418.40.04. Even though the context switch mechanism is hardware dependent, are there any general principles about GPU context switch. For example, does the scheduling policy follow FIFO or some fairness rules. – kz28 Aug 8 '19 at 17:10

  • 1

Again, not documented means not documented – talonmies Aug 8 '19 at 17:33

  • “Is this right?” – no, not necessarily. Very early CUDA driver implementations did work like that. Some platforms probably still do. But there is empirical evidence that some don’t. And for the third time – none of this is documented. It is all proprietary implementation details inside closed source drivers – talonmies Aug 9 '19 at 6:52

  • @talonmies, thank you. I see. – kz28 Aug 9 '19 at 13:30

When material is undocumented but discoverable through some means, such as code experimentation, code disassembly, etc. then in some cases I may be able to provide answers based on such discoverable attributes.

As far as I know, this information is not documented.

I don’t know of experiments or methods to discover this information. There may be such experiments, but I don’t know what they are.

In such situations, I am not at liberty to release non-public or non-discoverable or non-documented information about how CUDA behaves, of my own accord, with very limited exceptions.

So at this time I am not able to answer these questions. Further requests for me to answer these questions will not be responded to.

If you feel that CUDA documentation is lacking, one recourse you have is to file a bug using the information provided in a sticky post at the top of this sub forum, and request documentation to cover whatever topics you are interested in. There may be reasons why the CUDA developers choose not to document certain aspects of behavior.

Got it. Sorry for questing for unreleased information. Thanks for replying!