When is it necessary to create Multiple CUcontext s per device within a single process?

I see this in official Nvidia documentation:

From https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DRIVER.html

Note that the use of multiple CUcontext s per device within a single process will substantially degrade performance and is strongly discouraged. Instead, it is highly recommended that the implicit one-to-one device-to-context mapping for the process provided by the CUDA Runtime API be used.

So my question is:
When is it necessary to create Multiple CUcontext s per device within a single process?