help for my cuda code

medmab · October 22, 2014, 3:47pm

Hello.

I’m new in cuda programing and I’m trying to write code for CUDA “addition two vectors” .but I met a problem. if you can help me. this is the code .

#include “cuda_runtime.h”
#include “device_launch_parameters.h”

#include
#include <stdlib.h>
#include <stdio.h>
#include <stdio.h>
#include <stdio.h>

global void vecAdd(int *a, int *b, int *c, int n)
{
// Get our global thread ID
int id = threadIdx.x;

// Make sure we do not go out of bounds
if (id < n)
	c[id] = a[id] + b[id];

}

#define N 8

void random_ints(int* a, int h)
{

}

int main(void){

int *a, *b, *c;
int *d_a, *d_b, *d_c;
int size = N*sizeof(int);

//aloueé l'espace pour les copie de a,b etc dans le device (gpu)
cudaMalloc((void**)& d_a, size);
cudaMalloc((void**)& d_b, size);
cudaMalloc((void**)& d_c, size);

//aloueé de l'espace des copies de a,b et c dans le host (cpu) et affecter les variables
a = (int *)malloc(size);random_ints(a, N);
b = (int*)malloc(size); random_ints(b, N);
c = (int*)malloc(size);

for (int i = 0; i < size; ++i)
{
	a[i] = i;
	b[i] = i;
	c[i] = 0;
}

vecAdd(a, b, c,N);

//copier les affectation dans device (gpu)
cudaMemcpy(d_a, a, size, cudaMemcpyHostToDevice);
cudaMemcpy(d_a, b, size, cudaMemcpyHostToDevice);

//executer add() kernel dans le GPU avec N threads

vecAdd <<< 1, N >>> (d_a, d_b, d_c);    // problem here


//copier le resultat du gpu
cudaMemcpy(c, d_c, size, cudaMemcpyDeviceToHost);

free(a); free(b); free(c);
cudaFree(d_a); cudaFree(d_b); cudaFree(d_c);



printf("\nPress any key to exit...");
char w;
scanf("%w", &w);

return 0;

}

Skybuck · March 31, 2015, 1:39am

What’s the problem ? :)

mrjokero · March 31, 2015, 7:24pm

To at least compile you should remove the call to vecAdd(a, b, c, N); as by that name you only defined a kernel function. Here you try to call a host function.

Your kernel function global void vecAdd(int *a, int *b, int *c, int n) accepts 4 arguments and you called it with only 3 in vecAdd <<< 1, N >>> (d_a, d_b, d_c); As you already noticed you should call it with the last argument of size which is N. The call will look like this:
vecAdd <<< 1, N >>> (d_a, d_b, d_c, N);

The code should now compile.

Some errors in code:

When you iterate over all elements of host array in a for loop, the array index ‘i’ should be less than number of elements (N in your code) when you start from 0. In the conditional statement you check against size, which is size in bytes of these arrays. The for loop should look like this:
for (int i = 0; i < N; ++i)

The second cudaMemcpy copies from host b array to device d_a array again. You probably wanted to coby the host array b to device array d_b like this:
cudaMemcpy(d_b, b, size, cudaMemcpyHostToDevice);

Hope that helps.

Topic		Replies	Views
cudaMemcpy don't work CUDA Programming and Performance	4	1789	July 3, 2015
The Cuda Programming Guide Samples Errors CUDA Programming and Performance	5	2152	August 26, 2009
A biginner's CUDA.net problem CUDA Programming and Performance	0	1126	June 24, 2012
Why it doesnt work ? Simple program that adds two vectors CUDA Programming and Performance	6	3888	March 18, 2010
cudaMemcpy Failing To Copy Variable From Device To Host Correctly CUDA Programming and Performance	3	2639	April 26, 2021
MyFirstCuda CUDA Programming and Performance	5	4197	February 11, 2010
a problem complex array add with cuda ????? CUDA Programming and Performance	0	498	August 16, 2017
Basic summation of vectors for R CUDA Programming and Performance	2	713	December 6, 2016
Why does my streaming vector add fails? CUDA Programming and Performance	2	2726	August 26, 2011
Linux cpp and cu file compilation and usage Nsight Visual Studio Code Edition cuda , compile , linux	2	1076	April 25, 2022

help for my cuda code

Related topics