Double performance of GTX1080

I started playing around with openacc and test some samples on my GTX1080 and our GPU-Server with a Nvidia P100. So in this post I will focus on the openacc sample jsolvec.cpp of the OpenAcc Tutorial.

Running for double: (./jsolvec.exe 5000 1000000)

GTX 1080: 23.8 sec

P100: 9.88 sec

Running for float: (./jsolvec.exe 5000 1000000)

GTX 1080: 13.9 sec

P100: 6.3 sec

For the P100 this what I would expect with 9519GFLOPS (Single), 4759GFLOPS (Double). But since the GTX 1080 has 9912GFLOPS (Single), 310GFLOPS (Double), I would expect a worse result for the double performance.

Why is the double performance in this case better?

Hi andrew28349,

Without analysis, I can’t be sure, but most likely the code isn’t compute bound and other factors such as a reduction or memory access are impacting performance.

This would be a good opportunity for you to try using NVPROF or PGPROF, especially looking at the device metrics, to get a better understanding of where the time is being spent.

See: Profiler Guide :: PGI version 19.3 Documentation for x86 and NVIDIA Processors