I’m new here, so bear with me. In a school project we’re doing n-body simulations, in this case 16k particles with a O(N^2) algorithm. I’m running a gtx 970, and one of my project partners runs a gtx 1060. The weird thing is that my code takes ~1600ms to complete one iteration, whilst his takes 16ms. I’ve checked. It works. We run the exact same code. But I cannot fathom how we can reach a 100x speedup (and between different but almost equal gpus), when practically every forum says anything more than 5x speedup is suspicious.
What is going on here? Even if my computer were somehow secretly running the code on the CPU (which it shouldn’t thanks to device-declared code), there shouldn’t be a speedup of 100x…?