I have a simple cuda kernel (adding two vectors of size N) pretty similar to to this cuda blog here 1. I only changed a few things, e.g. running the measurement over various sample. So, let this run for, lets say, 1000 times and writing this measurement to a txt afterwards. If I plot now the measurements for transfering a vector to the device I get the following:
https://i.stack.imgur.com/adCSI.png
Now, if we take a look at the stddev drawn as vertical errorbars, then it should be clear, that for some reason, the data movements fluctuation scale with the size, because the errorbars are kinda constant in a log-log plot. This can be validated when only the stddev is plotted
https://i.stack.imgur.com/Rr7CN.png
If I take the very same programm from the cuda blog1, then I get for every 10-th run or so also bandwidth fluctuations. Where does this come from? I observed the same behaviour on two different GPUs, a V100 and a RTX2080