Tesla k40c performs slower than gtx1060 with OpenACC

I have a project that compares the pixel values of two pgm images using chi-square.
I use an MSI GE72VR laptop with a GTX1060 6GB version, while the Lap PC has a Tesla K40c GPU, both running ubuntu, however the CUDA version for the laptop is 8.0 while the lab PC uses 7.5, and both using PGI developer 16.10.

the acceleration part, where the GPU compares each two pixels using an equation, results in acceleration time of 0.04x seconds on the GTX 1060, and with the same code and optimization on the tesla k40c it completes the acceleration in 0.05x seconds.

is there a particular reason why the tesla k40 runs slower than the GTX ? I tried changing the occupancy but resulted in no advantage for the tesla.

Hi ibm218,

The GTX 1060 is based on the Pascal architecture (compute capability 6.0) while the K40 is a Kepler (CC35) which is a few generations older than a Pascal. So this difference isn’t unexpected.

  • Mat

thank you Mat.