Newbie: Super simple first CUDA program what's wrong?

I’ve compiled a simple CUDA program below and it doesn’t seem to work. I’m using MS Visual Studio 2008 and CUDA 2.3 + Quadro FX 5800.

I put a watch on variable C and it gives C={0,0,0,0,0} as the end result.
If I change cudaMemcpy(C,Cd,5sizeof(double),cudaMemcpyDeviceToHost); to cudaMemcpy(C,Ad,5sizeof(double),cudaMemcpyDeviceToHost);

then C={1,2,3,4,5} so I think I have most of it ok. What is wrong with my Kernel code vecAdd<<<1,5>>>(Ad,Bd,Cd); ?

Thanks for helping a newbie.

// add two vectors

#include <stdio.h>
#include <cuda.h>

// Kernel that executes on the CUDA device
extern “C” void gpuAdd(double* A, double* B, double *C);

global void vecAdd(double* A, double* B, double* C)
int i = threadIdx.x;
if (i < 5)

void gpuAdd(double* A, double* B, double *C)
double *Ad,*Bd,*Cd;



// main routine that executes on the host
int main(void)
double A[5]={1, 2, 3, 4, 5};
double B[5]={6, 7, 8, 9, 10};
double C[5];

you code is O.K, both for “sm_10” and “sm_13”.

modify your main function as

[codebox]int main(void)


double A[5]={1, 2, 3, 4, 5};

double B[5]={6, 7, 8, 9, 10};

double C[5];


int N = 5 ;

int i ;

for(i = 0 ; i < N ; i++){

	printf("C[%d] = %f\n", i, C[i] );


} [/codebox]

what’s your output?

I get:

C[0] = 0.000000

C[1] = 0.000000

C[2] = 0.000000

C[3] = 0.000000

C[4] = 0.000000

This is very weird. Again I also put a breakpoint at the end so I just watched the variables.

Any ideas what would cause this error but still compiles/runs fine?

You’re using doubles in the kernel, for that you need to compile with
-arch compute_13 -code sm_13
(perhaps only one of thse is required but I’m not sure which)

Thanks for the info!

I changed all the doubles -> floats and the code works.

Now I just need to figure out where to put the flags in Visual Studio and also in matlab and I am golden. Any hints would be greatly appreciated :)