win10 cuda program,low cpu usage!

Developing cuda program, the program uses both CPU and GPU to perform task, task includes STEP1 and STEP2, in which the GPU perform the task of STEP1, CPU perform the task of STEP2, STEP2 need to use the STEP1 results. My server contains two E5-2620 processors and four NVIDIA GTX1080 cards, so I assigned four STEP1 threads respectively bound to four GTX1080 card ,each thread uses one GPU to perform task of STEP1, and then in order to maximize the use of CPU resources 32 STEP2 threads are allocated using the CPU to perform task of STEP2。
My program is running on Windows 7 and Windows Server 2008, and the task is completed in about 5 minutes. But when Operating system is upgraded to Windows 10 or Windows Server 2012 , the speed of my program using the CPU to perform tasks STEP2 down to the previous 1/10, the task is completed in about 40 minutes 。 The task bottleneck is the CPU STEP2 speed too Slow , But the CPU utilization is only 20-30%, I wonder why my program in Windows 7 run normal, and in the Windows 10 CPU utilization difference so big. I have tried a lot of ways to improve CPU utilization, but have not been successful.