I want to add all the elements of an array to understand how the threads work. But I think I am making some mistake in my code, which I am not able to understand.
I am trying to generate just one thread and execute it over a loop in GPU to ad all the elements of array A.
Is it possible?
if not, then why?
I would be glad if someone could take a look at my program and try to help me out with it. Thanks in advance. Here’s my program.
[codebox]#include<stdio.h>
#include<stdlib.h>
void global add(double *A, double *C, int N){
int i;
i=threadIdx.x;
double add=0;
for(i=0;i<N;i++){
add=A[i]+add;
}
C[0]=add;
How are you compiling your program? If you have a 9000 or lower series card, you cannot use doubles, use floats instead. If you have a 200 series card, you need to compile with a special flag that allows you to use doubles. “-arch_sm13” or something like that.
Thanks for the reply guys. Yes I found the problem with C[0] just after posting this code. I was able to get my code working for single precision but double precision is giving me wrong answer. I am guessing the problem is with the library path or something since I am using 64 bit AMD with fedora 10 (64 bit). If you know how to rectify this problem please post your comments and help me out. Thanks.