Newbie: Super simple first CUDA program what's wrong?

I’ve compiled a simple CUDA program below and it doesn’t seem to work. I’m using MS Visual Studio 2008 and CUDA 2.3 + Quadro FX 5800.

I put a watch on variable C and it gives C={0,0,0,0,0} as the end result.
If I change cudaMemcpy(C,Cd,5sizeof(double),cudaMemcpyDeviceToHost); to cudaMemcpy(C,Ad,5sizeof(double),cudaMemcpyDeviceToHost);

then C={1,2,3,4,5} so I think I have most of it ok. What is wrong with my Kernel code vecAdd<<<1,5>>>(Ad,Bd,Cd); ?

Thanks for helping a newbie.

// add two vectors
//

#include <stdio.h>
#include <cuda.h>

// Kernel that executes on the CUDA device
extern “C” void gpuAdd(double* A, double* B, double *C);

global void vecAdd(double* A, double* B, double* C)
{
int i = threadIdx.x;
if (i < 5)
C[i]=A[i]+B[i];
}

void gpuAdd(double* A, double* B, double *C)
{
double *Ad,*Bd,*Cd;

cudaMalloc((void**)&Ad,5*sizeof(double)); 
cudaMalloc((void**)&Bd,5*sizeof(double)); 
cudaMalloc((void**)&Cd,5*sizeof(double));
cudaMemcpy(Ad,A,5*sizeof(double),cudaMemcpyHostToDevice);     
cudaMemcpy(Bd,B,5*sizeof(double),cudaMemcpyHostToDevice); 
vecAdd<<<1,5>>>(Ad,Bd,Cd);
cudaMemcpy(C,Cd,5*sizeof(double),cudaMemcpyDeviceToHost);
cudaFree(Ad);cudaFree(Bd);cudaFree(Cd);

}

// main routine that executes on the host
int main(void)
{
double A[5]={1, 2, 3, 4, 5};
double B[5]={6, 7, 8, 9, 10};
double C[5];
gpuAdd(A,B,C);
}

you code is O.K, both for “sm_10” and “sm_13”.

modify your main function as

[codebox]int main(void)

{

double A[5]={1, 2, 3, 4, 5};

double B[5]={6, 7, 8, 9, 10};

double C[5];

gpuAdd(A,B,C);



int N = 5 ;

int i ;



for(i = 0 ; i < N ; i++){

	printf("C[%d] = %f\n", i, C[i] );

}

} [/codebox]

what’s your output?

I get:

C[0] = 0.000000

C[1] = 0.000000

C[2] = 0.000000

C[3] = 0.000000

C[4] = 0.000000

This is very weird. Again I also put a breakpoint at the end so I just watched the variables.

Any ideas what would cause this error but still compiles/runs fine?

You’re using doubles in the kernel, for that you need to compile with
-arch compute_13 -code sm_13
(perhaps only one of thse is required but I’m not sure which)

Thanks for the info!

I changed all the doubles → floats and the code works.

Now I just need to figure out where to put the flags in Visual Studio and also in matlab and I am golden. Any hints would be greatly appreciated :)