So I decided I wanted to try CUDA programming on my GTX950. I’ve read that the profiler is a really useful tool so I wrote a little program to try it out and I got some really weird results. All the program does is transfers a matrix from the host to the device, inverts it, and transfers it back 1000 times. The puzzling bit is that about half of the transfers move at 10GB/s and the other have average around 2GB/s. This is consistent regardless of the Gen I specify in my bios. Has anyone else experienced inconsistent memory transfers?