Zerocopoy and UnifiedAddressing suddenly become slow

I implemented regular pinned memory, memory zerocopy and unified address space for a FFT operation using CUFFT.
Everything works fine until today. Today the zerocopy and unified addressing suddenly become slow (about 3 times slower). There are four gpu devices in my server and I have been using device 0. If I set the gpu device to 2 or 3, the speed becomes normal again.
I checked the nvidia-smi, the clock speed for all four devices are exactly the same.

What could be the cause?

“Everything works fine until today.”

seemingly, the system did not change from a hardware perspective

did you (or the system) update/ change the software in any way?