I am doing GPU programming in one of my courses at uni,
We’re doing a simple simulation task to compare performance between GPU’s and CPU’s. I have a 1080 - GTXTI, Initially I managed to make my GPU program run 20x faster than my CPU. But my friend with the RTX 2070 recorded boosts of 400x.
At first I thought maybe I had programmed something wrong, but when he ran my code on his GPU it also was 400x faster.
What could be wrong with my GPU? Or is the 1080GTX simply 20x slower than the RTX 2070? It feels a bit weird that my I5 CPU is as fast as my GPU thought…
Any help to shed light on this would be super appreciated
When I run your code on a GTX960, on linux, I get:
$ nvcc -O3 -o e3 exercise_3.cu
$ ./e3 1048576 8 256
CPU 46941
CPU 9777
GPU - CPU mean squared error: 3.96764e-12
$
Your printout lists CPU twice, but actually the second one corresponds to GPU timing. So according to my testing the GPU seems faster (lower is faster).
On a V100 I get results like this:
$ nvcc -O3 -o e3 exercise_3.cu -arch=sm_70
$ ./e3 1048576 8 256
CPU 62441
CPU 1150
GPU - CPU mean squared error: 3.96764e-12
$
GPU seems faster
On a RTX 2070 on windows I get:
C:\Users\Robert Crovella\source\repos\test22\x64\Release>test22 1048576 8 256
CPU 25783
CPU 1769
GPU - CPU mean squared error: 6.68266e-12
GPU seems faster.
With respect to GPU to GPU ratio, I don’t see a 20x ratio among any of those 3. I’m not using clang, and am mostly unfamiliar with clang. If you are using clang merely as the host compiler, I don’t think that should have much effect on GPU performance. If you are using clang to generate GPU device code also, that may be a factor. (If you were using windows I would normally say be careful not to compile a debug project.)
Hey, Thanks a lot for your response, I updated my Nvidia drivers and Cuda and that gave me a significant speed up, comparable to what you got. I don’t know why that made a big difference but it did, so I am very happy :)