unexpected slow performance

yagigami · February 29, 2020, 12:43am

Hello,
I wrote a program (repo here https://github.com/Yagigami/cuda_learning) in CUDA to try to learn how to use my graphics card for heavy parallel computing. I have a GeForce GTX 1050 Ti (so it can theoretically reach up to around 66 GFLOP/s in double precision and 2 TFLOP/s in single).
With the program I wrote, i barely am around 5 MFLOP/s, and in fact my CPU is better for now (1.4s for the same input).
So I am asking what am I doing so wrong for my program to run so slowly?
I tried a sample (“maxtrixMul”) and it was already much faster (295 GFLOP/s in single precision) even though i did not notice any huge difference with my program, aside from a #pragma unroll which i tried to use with fine-tuning without success.
I also fine-tuned the blocksize and gridsize to get the most performance, but this is still very slow.
When testing, i also tried shutting down every other program but that did not help either.
With some testing I discovered that the slowing down happens in the loop inside partial_sum which should only use registers so I do not see how bad memory usage could lead to that.
I hope you can help with that!

Topic		Replies	Views
Help me... Cuda program execution is slower than CPU...Did I miss any settings?? CUDA Programming and Performance	5	1192	September 24, 2015
GTX 1070 not running full clocks in CUDA CUDA Programming and Performance	3	2082	August 8, 2016
Question about GPU FLops CUDA Programming and Performance cuda , kernel	5	72	November 19, 2024
CUDA is slower than expected. Is something missing? CUDA Programming and Performance cuda , gpu , gpu-computing , parallel-computing	4	242	July 7, 2024
Why GPU might slow down. I'm having a problem with a CUDA program slowing down CUDA Programming and Performance	2	1810	December 22, 2010
[Help] 1080 GTX - TI 20x slower than 2070 RTX? CUDA Programming and Performance	2	469	November 9, 2020
Seemingly insignificant changes result in a 100x kernel slowdown CUDA Programming and Performance	2	561	February 14, 2020
Strange performance regression with a single GPU context on a multi GPU host CUDA Programming and Performance	11	953	April 7, 2021
device speed vs. host speed Why is my device program so slow? CUDA Programming and Performance	8	7892	August 16, 2007
Drop in performance while running 2 CUDA application in parallel CUDA on Windows Subsystem for Linux	0	133	May 27, 2024

unexpected slow performance

Related topics