Multi Process context switching


I am wondering what information exists on running multiple processes on a single GPU (Neural nets, CUDA, CV, etc)?

The goal is to learn about the context switching mechanics and how they impact performance.

In an ideal world the GPU would be able to handle infinite processes as long as there is sufficient memory and compute resources. It is my understanding that this is not the case, so I’d like to learn about the mechanisms in play to so I can make informed design decisions.


Detailed context switching mechanics and parametrics are not published or specified by NVIDIA to my knowledge. There are a variety of questions about it, here is one example, you can find others that delve into various topics, often from an experimental viewpoint.

That’s correct, at least one reason is because GPUs don’t have infinite memory or compute resources. In my experience, GPUs can generally handle as many processes as the finite memory (and perhaps other) resources will allow. There may be contexts-per-GPU limits (more or less like processes-per-GPU limits) but in my experience the memory cost per context will often be the limiting factor before a hard limit appears. I don’t know that any hard limits are published; it may require experimentation to discover.