Array addition Addition of all th elements of an array

I want to add all the elements of an array to understand how the threads work. But I think I am making some mistake in my code, which I am not able to understand.

I am trying to generate just one thread and execute it over a loop in GPU to ad all the elements of array A.

Is it possible?

if not, then why?

I would be glad if someone could take a look at my program and try to help me out with it. Thanks in advance. Here’s my program.

[codebox]#include<stdio.h>

#include<stdlib.h>

void global add(double *A, double *C, int N){

    int i;

    i=threadIdx.x;

    double add=0;

    for(i=0;i<N;i++){

    add=A[i]+add;

    }

    C[0]=add;

}

main(){

int N,i;

N=10;

double A[N],C[0];

for(i=0;i<N;i++){

A[i]=1.0;

}

double *d_A,*d_C;

size_t size=N*sizeof(double);

cudaMalloc((void**)&d_A,size);

//cudaMalloc((void**)&d_B,size);

size_t sizeC=1*sizeof(double);

cudaMalloc((void**)&d_C,sizeC);

cudaMemcpy(d_A, A, size,cudaMemcpyHostToDevice);

add<<<1,1>>>(d_A,d_C,N);

cudaMemcpy(C,d_C,sizeC,cudaMemcpyDeviceToHost);

printf("\n%f ",C[0]);

}

[/codebox]

How are you compiling your program? If you have a 9000 or lower series card, you cannot use doubles, use floats instead. If you have a 200 series card, you need to compile with a special flag that allows you to use doubles. “-arch_sm13” or something like that.

I wonder how could you compile your code

you will have two compilation errors

  1. A[N]

    N is not a constant expression

  2. C[0]

you can not define a vector of size 0

modify your code as

[codebox]#define N 10

int main()

{

int i;



double A[N] ;

double C[1] ;

…[/codebox]

then program works

my platform: winxp pro64, vc2005, driver 190.38, cuda 2.3, GTX295

With a gnu compiler for example.

Thanks for the reply guys. Yes I found the problem with C[0] just after posting this code. I was able to get my code working for single precision but double precision is giving me wrong answer. I am guessing the problem is with the library path or something since I am using 64 bit AMD with fedora 10 (64 bit). If you know how to rectify this problem please post your comments and help me out. Thanks.

I test two machines

  1. winxp pro 64, vc2005, driver 190.38, cuda 2.3, GTX295

  2. Fedora 10 x64, gcc 4.3.2, driver 185.18, cuda2.2, GTX260

the program works for both machines, “float” and “double”

what is your configuration?