I see this in official Nvidia documentation:
From CUDA Runtime API :: CUDA Toolkit Documentation
Note that the use of multiple CUcontext s per device within a single process will substantially degrade performance and is strongly discouraged. Instead, it is highly recommended that the implicit one-to-one device-to-context mapping for the process provided by the CUDA Runtime API be used.So my question is:
When is it necessary to create Multiple CUcontext s per device within a single process?