I started playing around with openacc and test some samples on my GTX1080 and our GPU-Server with a Nvidia P100. So in this post I will focus on the openacc sample jsolvec.cpp of the OpenAcc Tutorial.
Running for double: (./jsolvec.exe 5000 1000000)
GTX 1080: 23.8 sec
P100: 9.88 sec
Running for float: (./jsolvec.exe 5000 1000000)
GTX 1080: 13.9 sec
P100: 6.3 sec
For the P100 this what I would expect with 9519GFLOPS (Single), 4759GFLOPS (Double). But since the GTX 1080 has 9912GFLOPS (Single), 310GFLOPS (Double), I would expect a worse result for the double performance.
Why is the double performance in this case better?