nvmlDeviceGetPowerUsage results fluctuate severely during periodic executions of fixed workloads

We wrote a simple program which launches 50-ms (measured in advance) cublas GEMM kernels at the start of every 100-ms period, and another monitor program to keep getting the current power usage of the GPU by calling nvmlDeviceGetPowerUsage repeatedly. Then both of the two programs ran at the same time to measure the power usage of our simple program.

We expected that the measured power would be constant or fluctuate within a relatively small range since the workload of our simple program is fixed (we launched exactly the same number of cublas GEMM kernels with exactly the same arguments in every period). However, the monitored power by nvmlDeviceGetPowerUsage was not constant and fluctuated severely (reached 271Watt at maximum and 64 Watt at minimum) and thus we failed to acquire the GPU power usage during the execution of our program.

Is this a bug or what should one do to accurately measure GPU power in a range of time?

The main loop in our simple program is as follows:

auto ts = std::chrono::steady_clock::now();
for (;;)
{
    for (int i = 0; i < nKernel; i++)
    {
        checkErrors(cublasSgemm(handle, CUBLAS_OP_N, CUBLAS_OP_N, matrix_size.uiWB, matrix_size.uiHA, matrix_size.uiWA, &alpha, d_B, matrix_size.uiWB, d_A, matrix_size.uiWA, &beta, d_C, matrix_size.uiWB));
    }

    checkErrors(cudaDeviceSynchronize());

    ts += std::chrono::milliseconds(100);
    std::this_thread::sleep_until(ts);
}

We had measured the execution time of nKernel gemm kernels in advance to make sure it is around 50 ms.