GPU speedup query

Logan15 · July 22, 2010, 9:16am

I ran a modified version of vector addition example in CUDA sdk. where by I have made two vectors of size 100 and adding them to get a third one. I ran it on Tesla 1060 gpu. This gave me a speedup of around 10 units. Shouldn’t it be 100. I was using threadsperblock size of 30. Please find the code attached. Thanks in advance.

vectorAdd.cu (4.61 KB)

tera · July 22, 2010, 10:25am

Vector addition is limited by memory bandwidth, and the memory bandwidth of a GPU typically is around 10x that of a CPU.

Also, GPU want massively parallel tasks - they can run thousands of threads in parallel. So if you really reach a 10x speedup with a vector of only 100 elements, that’s a pretty good result.

Finally, threadsperblock should be a multiple of 32 (or better yet, 64) to avoid wasting resources through partially occupied warps.

Topic		Replies	Views
Cuda works slower then CPU CUDA Programming and Performance	1	578	November 29, 2019
Vector addition on 8600M GT Explaination CUDA Programming and Performance	6	2914	February 4, 2010
VectorAdd example from CUDACast #2 CUDA Programming and Performance	3	921	August 20, 2014
slow speed of cuda code CUDA Programming and Performance	4	5332	October 30, 2011
CPU faster than CUDA CUDA Programming and Performance	2	1899	September 6, 2020
what conclusion can I get from this experinment? CUDA Programming and Performance	7	779	July 20, 2017
Why performance on GV100 increases by almost 50% when doubling the block size? CUDA Programming and Performance	7	643	October 12, 2021
Performance Boost Not Really Seen CUDA Programming and Performance	8	997	December 21, 2010
Flop/s model for vector addition ? CUDA Programming and Performance	1	477	July 24, 2019
What CUDA GPU can give 10000 times performance of a CPU(1core 3Ghz)? CUDA Programming and Performance	3	1185	January 25, 2019

GPU speedup query

Related topics