How does the frequency setting interface of NVML affect NCCL communication?

During model training, I performed high-frequency DVFS adjustments using pynvml.nvmlDeviceSetGpuLockedClocksfor frequency tuning. Based on my experimental results, high-frequency calls to nvmlDeviceSetGpuLockedClocks(approximately every 50ms) caused significant delays in certain NCCL communication operators. This delay does not seem to be related to the frequency value itself; if I fix a very low frequency and avoid frequent adjustments, the delay does not occur. Does anyone know why this happens?

Additionally, does anyone know whether, when eight processes on a server simultaneously set frequencies for eight GPUs, the NVML operations are executed in parallel or sequentially?