I can’t quantify the performance degradation for you, but the reason that it is advised against is because CUDA activity in separate contexts cannot run on the device simultaneously. The device must context-switch between activity from each context, and this incurs overhead that is not incurred if all threads of a process are sharing the same context.
The multiple contexts per process scenario basically puts you in the same performance boat as running multiple processes on a single GPU (and without any benefit from CUDA-MPS).