100x speedup between gtx 970 and gtx 1060

I’m new here, so bear with me. In a school project we’re doing n-body simulations, in this case 16k particles with a O(N^2) algorithm. I’m running a gtx 970, and one of my project partners runs a gtx 1060. The weird thing is that my code takes ~1600ms to complete one iteration, whilst his takes 16ms. I’ve checked. It works. We run the exact same code. But I cannot fathom how we can reach a 100x speedup (and between different but almost equal gpus), when practically every forum says anything more than 5x speedup is suspicious.

What is going on here? Even if my computer were somehow secretly running the code on the CPU (which it shouldn’t thanks to device-declared code), there shouldn’t be a speedup of 100x…?

debug build on one computer, release on the other. I’ve seen plenty of examples of this.

I’m sure there are plenty of other possibilities as well. Maybe one computer has a different source code, a different parameter (number of particles), or some other difference.

No, you’re right - that was the issue. Thanks :)
For a second there I thought I was having a stroke or something.