[Help] 1080 GTX - TI 20x slower than 2070 RTX?

I am doing GPU programming in one of my courses at uni,

We’re doing a simple simulation task to compare performance between GPU’s and CPU’s. I have a 1080 - GTXTI, Initially I managed to make my GPU program run 20x faster than my CPU. But my friend with the RTX 2070 recorded boosts of 400x.

At first I thought maybe I had programmed something wrong, but when he ran my code on his GPU it also was 400x faster.

Here’s my code https://github.com/DD2360-Assignments-Jonas-Valfridsson/Assignments/blob/main/Assignment_2/ex_3/exercise_3.cu

I tried adding optimization flags to my compilation but that only made my CPU code faster… so now my GPU is not even faster than my CPU.

The compilation command I am using is

function ccuda {
  clang++ $1 -o $2 -O3 -std=c++11 -ffast-math -fcuda-flush-denormals-to-zero -ffp-contract=fast --cuda-gpu-arch=sm_61 -L/usr/local/cuda-10.0/lib64 -lcudart_static -ldl -lrt -pthread
}

What could be wrong with my GPU? Or is the 1080GTX simply 20x slower than the RTX 2070? It feels a bit weird that my I5 CPU is as fast as my GPU thought…

Any help to shed light on this would be super appreciated

When I run your code on a GTX960, on linux, I get:

$ nvcc -O3 -o e3 exercise_3.cu
$ ./e3 1048576 8 256
CPU 46941
CPU 9777
GPU - CPU mean squared error: 3.96764e-12
$

Your printout lists CPU twice, but actually the second one corresponds to GPU timing. So according to my testing the GPU seems faster (lower is faster).

On a V100 I get results like this:

$ nvcc -O3 -o e3 exercise_3.cu -arch=sm_70
$ ./e3 1048576 8 256
CPU 62441
CPU 1150
GPU - CPU mean squared error: 3.96764e-12
$

GPU seems faster

On a RTX 2070 on windows I get:

C:\Users\Robert Crovella\source\repos\test22\x64\Release>test22 1048576 8 256
CPU 25783
CPU 1769
GPU - CPU mean squared error: 6.68266e-12

GPU seems faster.

With respect to GPU to GPU ratio, I don’t see a 20x ratio among any of those 3. I’m not using clang, and am mostly unfamiliar with clang. If you are using clang merely as the host compiler, I don’t think that should have much effect on GPU performance. If you are using clang to generate GPU device code also, that may be a factor. (If you were using windows I would normally say be careful not to compile a debug project.)

Hey, Thanks a lot for your response, I updated my Nvidia drivers and Cuda and that gave me a significant speed up, comparable to what you got. I don’t know why that made a big difference but it did, so I am very happy :)

Thanks again for your response.