help for my cuda code


I’m new in cuda programing and I’m trying to write code for CUDA “addition two vectors” .but I met a problem. if you can help me. this is the code .

#include “cuda_runtime.h”
#include “device_launch_parameters.h”

#include <stdlib.h>
#include <stdio.h>
#include <stdio.h>
#include <stdio.h>

global void vecAdd(int *a, int *b, int *c, int n)
// Get our global thread ID
int id = threadIdx.x;

// Make sure we do not go out of bounds
if (id < n)
	c[id] = a[id] + b[id];


#define N 8

void random_ints(int* a, int h)


int main(void){

int *a, *b, *c;
int *d_a, *d_b, *d_c;
int size = N*sizeof(int);

//aloueé l'espace pour les copie de a,b etc dans le device (gpu)
cudaMalloc((void**)& d_a, size);
cudaMalloc((void**)& d_b, size);
cudaMalloc((void**)& d_c, size);

//aloueé de l'espace des copies de a,b et c dans le host (cpu) et affecter les variables
a = (int *)malloc(size);random_ints(a, N);
b = (int*)malloc(size); random_ints(b, N);
c = (int*)malloc(size);

for (int i = 0; i < size; ++i)
	a[i] = i;
	b[i] = i;
	c[i] = 0;

vecAdd(a, b, c,N);

//copier les affectation dans device (gpu)
cudaMemcpy(d_a, a, size, cudaMemcpyHostToDevice);
cudaMemcpy(d_a, b, size, cudaMemcpyHostToDevice);

//executer add() kernel dans le GPU avec N threads

vecAdd <<< 1, N >>> (d_a, d_b, d_c);    // problem here

//copier le resultat du gpu
cudaMemcpy(c, d_c, size, cudaMemcpyDeviceToHost);

free(a); free(b); free(c);
cudaFree(d_a); cudaFree(d_b); cudaFree(d_c);

printf("\nPress any key to exit...");
char w;
scanf("%w", &w);

return 0;


What’s the problem ? :)

To at least compile you should remove the call to vecAdd(a, b, c, N); as by that name you only defined a kernel function. Here you try to call a host function.

Your kernel function global void vecAdd(int *a, int *b, int *c, int n) accepts 4 arguments and you called it with only 3 in vecAdd <<< 1, N >>> (d_a, d_b, d_c); As you already noticed you should call it with the last argument of size which is N. The call will look like this:
vecAdd <<< 1, N >>> (d_a, d_b, d_c, N);

The code should now compile.

Some errors in code:

When you iterate over all elements of host array in a for loop, the array index ‘i’ should be less than number of elements (N in your code) when you start from 0. In the conditional statement you check against size, which is size in bytes of these arrays. The for loop should look like this:
for (int i = 0; i < N; ++i)

The second cudaMemcpy copies from host b array to device d_a array again. You probably wanted to coby the host array b to device array d_b like this:
cudaMemcpy(d_b, b, size, cudaMemcpyHostToDevice);

Hope that helps.