GPU Context switch of multiple processes

I am looking into performance interference problem among co-running processes on a single GPU. So I need learn more about GPU context switch. This question is targeting the following scenario: two processes (e.g., two tensorflow object detection applications/processes running on a single GPU, Nvida Tesla P100 CUDA version 10.1 Driver Version 418.40.04 on Ubuntu16). I am NOT using nvidia multi-process service:https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf
My question is about how the cudaContexts of different processes are scheduled to execute on a single GPU (by leveraging time slicing as far as I know). Specifically:

(1) What is the scheduling policy?

I read some papers/documents. They mention the scheduling policy is FIFO: the cuda+driver maintain a single queue holding all pending kernel execution requests, as long as the kernel in front of the queue belongs to a different cudaContext than the current running cudaContext, a gpu context switch is invoked. Is this right?

(2) What are the “scheduling resources” (mentioned in 2.1.3 of this nvidia document:https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf) that need to be swapped off and on GPU during a context switch? In another word, where does the gpu context switch overhead come from?

(3)Will the gpu memory allocated to a process survive a context switch (won’t be swapped off GPU chip)? I guess the GPU memory allocated to a process will always be residing on GPU as long as the process is running.

I know the context switch mechanism of multiple processes on gpu is involved. I just want to know the principles, e.g., does the scheduling policy follow FIFO rules or some “fairness” rules.

cross-posting:

https://stackoverflow.com/questions/57416664/nvidia-cuda-gpu-context-switch-among-multiple-processes

Hi Robert, I have also posted this question on Stackoverflow. Just want to reach out to more people in the community. This is my first post on the Nvidia forum. if cross-posting is forbidden in this forum I won’t do that again. Sorry…
As for Gpu process-level context switch, there is little relative document available. Could you shed light on this topic. Or could you point me to some useful links/materials?

No problem with cross posting. I point it out occasionally because folks who are reading your question may also be interested in responses posted elsewhere.

Indeed there is little documentation available.

Hi,
I’m also interested in knowing the answers to these questions, but the link to SO seems to be broken (maybe the SO article was removed).
Can you please re-answer these questions, or find a valid link to any other article answering them?

Thanks in advance.