Windows 7 vs Linux

Im using a Tesla K20c and the performance in Linux (Ubuntu) is always about 2X to 5X faster than Windows 7 for the exact same kernel code. I’m timing just basic math functions and reading/writing GPU global memory, atomic add, etc, using sm_35, coalesced memory calls, etc etc. . Since the testing is on the same machine, just booting into different OS’s, not sure what the issue is. Using Visual Studio 2012 on windows 7. Have the latest drivers loaded for each OS, would have thought there may be some differences in speed but not this much. Tried another machine using the same kernels that had a TITAN in it with the exact same results, being Windows 7 was always slower by 2x to 5X than Ubuntu.

Is this normal ?

(1) How are you measuring the kernel execution times?
(2) Is K20c running with the TCC driver?
(3) Is the code correctly compiled for the sm_35 platform and with the exact same compiler switches on both platforms? If you are using the MSVS IDE, I would suggest double-checking the generated command lines
(4) If you have multiple GPUs in this system, I would suggest double checking that the kernels are actually running on the K20c

hi, thanks for the reply

  1. Measuring the times with cudaEventCreate
  2. Not using TCC
  3. Same nvcc switches, which were minimal other than using compute_35 and sm_35.
    but will double check the command line options on VS
  4. Defiantly running on the Tesla, had the same results on the GTX Titan box (which was a little
    faster than the Tesla K20C, it had only one GPU , but the same results.

Found the issue,

VS2012 was sticking in -G for debugging in the middle of the nvcc line, removed it and the timing is now the same between Linux and Windows 7. I should have thought of that first.

thanks njuffa for leading me to it with advice #3

Good to hear you tracked it down. You may want to consider using the TCC driver with tesla K20c, it has significantly less driver overhead than Windows’ default WDDM driver (this may be visible in terms of app-level performance rather than kernel execution time which is independent of driver).