An Easy Introduction to CUDA C and C++

anon95180265 · September 3, 2015, 5:43am

I suspect you are not setting a GPU breakpoint. If you are stepping on the host, it will not automatically step into GPU code because different threads are running it, so you need to set a breakpoint in the GPU function. Please check the debugger docs / tutorials.

anon95180265 · September 3, 2015, 5:43am

Not sure what you are asking. That's the only header needed by this example.

anon28772862 · September 3, 2015, 5:57am

Yeab, you are right. It's been just two days I studied this. Thanks for your reply!!

anon69391146 · September 26, 2015, 11:35pm

I know this is old, but in your kernel call, why do you pass "2.0f"? Is that because there are two floating point operations taking place in your kernel?

anon13901515 · September 27, 2015, 10:07pm

Is just some arbitrary number for 'A' in the 'A*X + Y' expression.

anon69391146 · September 27, 2015, 11:18pm

That what I figured after looking at it a bit closer was that it was just a generic constant.

Thanks!

anon5108241 · June 8, 2016, 2:41am

There's a missing > in a < code > HTML tag, just search for "coden" and it should be the only instance on this page (OK, other than mine!)

anon95180265 · June 8, 2016, 3:17am

Fixed -- thanks!

anon15625940 · July 6, 2016, 4:42pm

Just a small edit, there's a missing '\' before the 'n' in the printf statement. This post has been a very useful, simple and concise starting point for getting into CUDA. Thanks!

anon85106120 · October 16, 2016, 10:22am

Thank you for the post. I am new to CUDA and would like to clarify some errors I came across. Running the 'nvcc -o saxpy saxpy.cu' on my command promt gives me 'Cannot find compliler 'cl.exe' in Path '. Also I have the following errors on the sample code I plan to run as seen in the screenshot. Does this indicate a mistake in the compiler installation?
I am using Visual Studio 2015 and have a Nvidia geforce 820m.
https://uploads.disquscdn.c...
Thank you
Shrikanth Yadav

anon95180265 · October 26, 2016, 7:49pm

Do you have CUDA 8 installed? Previous versions did not support Visual Studio 2015. I can't see the errors in your screenshot.

anon85106120 · October 27, 2016, 7:30am

Hi
I was able to solve the prolem after reinstalling both VS15 and CUDA. The compiling issue is also solved. Thank you

anon55248409 · October 30, 2016, 6:34pm

Hi , I'm new in this side and I have final project and I need to use CUDA to handle Big data .
My question how I can read file in CUDA C++?

anon95180265 · October 31, 2016, 2:12pm

The host code (which does the file loading) is just regular C/C++. So load files just like you normally would.

anon4005837 · December 10, 2016, 12:06am

Hi, I've run both the .cu code and .cuf code for this example. The .cu code runs as is and gives the proper result, however the .cuf code returns 2.00000. I'm new to CUDA and am wondering if you have any idea why the results are different? The machine I'm using does have 2 gpus installed.

anon4005837 · December 13, 2016, 1:50am

From pgforums... Pascal GPUs need to explicitly generate binaries from cuda-8.0... See link for solution, if interested.
https://www.pgroup.com/user...

anon30260711 · January 2, 2017, 2:55pm

What about freeing allocated memory?
I'm beginner in CUDA C, but I think you should free requested memory, formally.

anon95180265 · January 5, 2017, 4:11am

Hi Jacek, great point. I corrected this omission in the post.

anon92999331 · January 8, 2017, 5:31pm

I am a CUDA beginner... Thanks for the great tutorial, helped me a lot in getting started!

My question concerns the execution configuration: Out of curiosity, I am also outputting the blockIdx, blockDim and threadIdx for every thread of the saxpy kernel (added one line to void saxpy: printf("Block idx.x, dim.x, threadIdx.x: %i %i %i\n", blockIdx.x, blockDim.x,threadIdx.x);)

Now I created this output 3 times with different execution configurations:
1) above original: I get a list of 4096 rows, the second column for all rows is 256. The sum is 1,048,576 (== N, as expected).
2) configuration: saxpy<<<(N+511)/512, 512>>>(N, 2.0f, d_x, d_y); I still get 4096 rows, but this time the second column is always 512. The total number is 2,097,152.
3) configuration: saxpy<<<(N+127)/128, 128>>>(N, 2.0f,
d_x, d_y); I still get 4096 rows, but this time the second column is
always 128. The total number is 524,288.

I don't understand... Why is the total number of threats always 4096, but the total product of dimension*threads is not preserving N in all three cases?

Also, in the follow-up post (on measuring), the integer division of the execution configuration is changed from /256 to /512, but the comment line still reads "SAXPY on 1M elements". What am I missing?

anon95180265 · January 9, 2017, 5:09am

I think if you look at your code carefully you'll discover that each example is actually trying to print N (= 1M) lines. But you are running up against the printf FIFO default size of 1 MB and getting many fewer than that printed. If you call cudaDeviceSetLimit(cudaLimitPrintfFifoSize, X); for some large value of X you'll get more, but you may instead want to limit it to only print the first thread of every block instead ("if (threadIdx.x == 0)").

Topic		Replies	Views
An Even Easier Introduction to CUDA Technical Blog	141	6056	November 28, 2023
An Easy Introduction to CUDA Fortran Technical Blog	7	565	June 21, 2024
Can a Kernel be too big?? CUDA_ERROR_NO_BINARY_FOR_GPU error 209 CUDA Programming and Performance	11	2974	November 13, 2017
CUDA very slow performance CUDA Programming and Performance	21	16410	March 6, 2020
What can't you do in CUDA that you'd like? Requests for the future CUDA Programming and Performance	407	134546	May 26, 2010
Using unified memory causes system crash CUDA Programming and Performance	28	5781	February 4, 2019
simplest programming environment (editor) for Cuda? CUDA Programming and Performance	23	22847	March 13, 2009
Annoying problems with memory and/or syntax CUDA Programming and Performance	19	4767	April 8, 2008
Simple/1st CUDA program: Reverse bits in byte Why is it faster on the CPU? CUDA Programming and Performance	11	7110	December 6, 2007
Cuda code performance CUDA Programming and Performance	14	3090	December 16, 2014

An Easy Introduction to CUDA C and C++

Related topics