Cuda function increase new thread and kernel-mode process

When we are running a large project on Computer with Tesla T4, which has high requirements for the CPU performance of a specific CPU core, call cudaEventCreateWithFlags(m_EncEvent, cudaEventBlockingSync) cause the program run very slowly! if call cudaEventCreateWithFlags(m_EncEvent, cudaEventBlockingSync) in some thread which bind on other CPU core thread is no problem.Can I choose which core to bind? Do cuda have any core binding requirements?

In addition use command ps -ef | grep nvidia,we found increase 9 kernel-mode nvidia process,

sometimes No program runn as :

And we found call “cudaGetDeviceCount” function increased one thread.But call cudaStreamCreateWithFlag Function increase two user-mode thread,why?

When do these kernel processes generate? why need so many process and What is the purpose?
How much is the CPU utility of those kernel processes ?
Whether the new threads and related kernel processes need in the same CPU core?