I have two cards GK110 vs GM200. I kept getting a 60% difference in performance for the same job…
The job uses only 1 Gb of GPU memory to run and complete.
I have noticed a few minor differences that could be causing an issue but not a 60% difference.
With this little information, one could only speculate wildly. What specific GPU models are we talking about? What is the execution time for each card? Does the code in question use any double-precision computation? What were the exact nvcc compiler switches used to compile the code?