Multi Process context switching

Detailed context switching mechanics and parametrics are not published or specified by NVIDIA to my knowledge. There are a variety of questions about it, here is one example, you can find others that delve into various topics, often from an experimental viewpoint.

That’s correct, at least one reason is because GPUs don’t have infinite memory or compute resources. In my experience, GPUs can generally handle as many processes as the finite memory (and perhaps other) resources will allow. There may be contexts-per-GPU limits (more or less like processes-per-GPU limits) but in my experience the memory cost per context will often be the limiting factor before a hard limit appears. I don’t know that any hard limits are published; it may require experimentation to discover.