My first program it doesn't behave as expected

Hi everyone, I’m new here, and I’m sure this is gonna be the first of many days for me as a CUDA “programmer” :D

I’ve got Vista 64 and VisualStudio2008. I somehow managed to compile my first program, but it doesn’t do what I expected External Image

Here is the code:

#include <stdio.h>

__global__ void VecAdd(int* A, int* B, int* C){

	int i = threadIdx.x;

	C[i] = A[i] + B[i];

}

int main(){

	const int N=5;

	int A[N], B[N], C[N];

	

	for(int i=0;i<N;i++){

		A[i]=1;

		B[i]=2;

	}

	// Kernel invocation

	VecAdd<<<1, N>>>(A, B, C);

	

	

	for(int i=0;i<N;i++){

		printf("%d ", C[i]);

	}

	scanf("%d", A[0]);

	

}

It is supposed to sum the vectors A and B, and print all the elements of C. Shouldn’t it print “3 3 3 3 3”?

This is the weird output that I got instead:

-1 -2 -2147362916 1 0

How is it possible?

Thanks!

Simone

The GPU can only operate on data that has been copied to the device memory. You need to allocate a device version of your A, B, and C arrays and copy data to/from them as needed. (cudaMalloc and cudaMemcpy are the important functions here)

Your kernel call actually failed when the GPU tried to access host pointers, but since you never check error messages in this code, that would not have been apparent.

To see a simple example which includes cudaMalloc and cudaMemcpy, see this article from Dr. Dobbs:

[url=“CUDA, Supercomputing for the Masses: Part 2 | Dr Dobb's”]http://www.ddj.com/cpp/207402986[/url]

(The entire article series is very good.)

Thank you very much seibert, actually a friend of mine who is pretty good at cuda told me about that a couple of hours after I made the post; eventually we decided that the best thing for me was to continue reading the programming manual that comes with cuda External Media (and in fact, it clarified those aspects as I went on reading)

The problem is that I stopped at the first example in the manual:

// Kernel definition

__global__ void VecAdd(float* A, float* B, float* C)

{

int i = threadIdx.x;

C[i] = A[i] + B[i];

}

int main()

{

// Kernel invocation

VecAdd<<<1, N>>>(A, B, C);

}

and tried to modify it and run it, without knowing much about cuda. My fault :">

Anyway thanks for the link!