When is it necessary to create Multiple CUcontext s per device within a single process?

I see this in official Nvidia documentation:

From CUDA Runtime API :: CUDA Toolkit Documentation

Note that the use of multiple CUcontext s per device within a single process will substantially degrade performance and is strongly discouraged. Instead, it is highly recommended that the implicit one-to-one device-to-context mapping for the process provided by the CUDA Runtime API be used.

So my question is:
When is it necessary to create Multiple CUcontext s per device within a single process?

1 Like