Regarding flop efficiency reported by nvprof

I see some differences between the flop efficiency reported by nvprof and what I achieve using pencil and paper.

On 1080Ti, the reported flop efficiency of a kernel is 16.2%.

Looking at the device’s spec and the formula, the peak GFLOP/s is calculated by

SM_COUNT x CUDA_CORE_PER_SM x CLOCK x 2
= 281281.683(GHz)*2
= 12,064 GFLOP/s

Now, when I look at the flop_count_sp, the value is 1,796,739,259 and the kernel runtime is 0.8ms.
So, on paper, the flop value is

1,796,739,259 / (0.8 * 10^-3) = 2,245,924,073,750 (FLOP/s) = 2,245 (GFLOP/s)

Now, the efficiency on paper is:
2,245/12064 = 0.182
Which means 18.2%

So, nvprof says flop efficiency is 16.2% while my calculation show it is 18.2%.
Should we assume that error value is small or something is missing in the calculations?
Any idea?