VectorAdd example from CUDACast #2

OneAlex · August 20, 2014, 12:02pm

Hello, I am new in CUDA programming. I work on VC 2012 Pro having Installed the CUDA Toolkit v.6.0 and I am trying to recreate the example shown on CUDACast #2 from youtube. My first CUDA program.

Although everything is working as shown on the video, when I try to check for execution times I realize that instead of an accelerated program I get a slower one with CUDA.

I check my times using a timer.h header similar to the one used on CUDACast #3 and while I get about 0.000012 seconds for the CPU program, I reach 0.000063 seconds with the CUDA program. Please help!

I am using a GEFORCE GTX 750.

Robert_Crovella · August 20, 2014, 1:15pm

Your first CUDA program is intended to teach you the basics, not necessarily be fast. Not every CUDA program will be faster than some corresponding CPU program. Depending on how you do the timing, Vector Add will not necessarily be faster (by itself) on the GPU, due perhaps to the time cost of transferring the data to and from the GPU.

OneAlex · August 20, 2014, 5:18pm

Thank you very much for the quick response. So i try to increase the size of the vectors to produce more computing workload. The CPU execution time goes up but the CUDA code cannot add the vectors properly. The result is all zeros. Why is this happening?

Robert_Crovella · August 20, 2014, 9:45pm

Increasing the size of the vectors won’t help from a performance measurement point of view, if your timing is including the data transfers, since increasing the size of the vectors also increases the size of the data to be transferred. This particular educational kernel is also using only one threadblock, which is not a high-performance approach to CUDA programming.

The code you are working with has been stripped down to the bare essentials to highlight the concepts that were intended to be presented. That code is easy to break. I’d suggest that taking what is basically the most introductory cudacast, and then using Q+A here to advance your CUDA knowledge is not very efficient. Take advantage of further learning materials that are available:

https://developer.nvidia.com/gpu-computing-webinars

When posting questions here, you’re more likely to get useful help if you show the code you’re working with. Yes, in this case, I can go hunt for it on the internet, but you’ve made changes, right? So show the exact code. It’s not hard to do.

If you want to do CUDA programming, you’ll be well advised to learn more about error handling. First of all, google “proper cuda error checking” and read the first link from stack overflow. For educational purposes, that type of error checking is frequently left out, so as not to obscure the concepts being introduced.

So any time you are having trouble with a CUDA code, your first responses should be:

make sure you are doing proper CUDA error checking on all CUDA kernels and API calls
run your code with cuda-memcheck (a very useful debugging utility).

To answer your question, I assume that you modified this line (only):

#define SIZE 1024

to some higher value.

That line is, among other things, determining the number of threads per block. There are no CUDA GPUs currently available which can run with more than 1024 threads per block. When you change that to a higher number, you are increasing the vector lengths, but you are also modifying the config parameters of the kernel:

VectorAdd<<< 1, SIZE >>>(d_a, d_b, d_c, SIZE);
                ^^^

That kernel won’t launch when SIZE is greater than 1024. The programming guide covers many topics, including limits like this:

http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#features-and-technical-specifications

With proper CUDA error checking on this kernel launch, you would discover that one of the config parameters was invalid. You’d be well on your way to understanding and solving the problem.

Topic		Replies	Views
Why does my streaming vector add fails? CUDA Programming and Performance	2	2792	August 26, 2011
slow speed of cuda code CUDA Programming and Performance	4	5332	October 30, 2011
Unespected output for a basic program CUDA Programming and Performance	6	1034	December 10, 2014
Vector addition on 8600M GT Explaination CUDA Programming and Performance	6	2914	February 4, 2010
MyFirstCuda CUDA Programming and Performance	5	4279	February 11, 2010
CUDA slower than CPU? CUDA Programming and Performance	7	987	August 18, 2023
Performance of addition of two vectors Visual Profiler and nvprof cuda	0	489	April 27, 2020
Getting started with CUDA ... cannot add simple vectors CUDA Programming and Performance	9	21077	January 31, 2011
Cuda program taking more time. CUDA Programming and Performance	15	7189	November 21, 2010
more time taken by CUDA rather than reducing time CUDA Programming and Performance	7	4695	November 18, 2010

VectorAdd example from CUDACast #2

Related topics