Newbie: Super simple first CUDA program what's wrong?

mike56 · October 2, 2009, 1:46am

I’ve compiled a simple CUDA program below and it doesn’t seem to work. I’m using MS Visual Studio 2008 and CUDA 2.3 + Quadro FX 5800.

I put a watch on variable C and it gives C={0,0,0,0,0} as the end result.
If I change cudaMemcpy(C,Cd,5sizeof(double),cudaMemcpyDeviceToHost); to cudaMemcpy(C,Ad,5sizeof(double),cudaMemcpyDeviceToHost);

then C={1,2,3,4,5} so I think I have most of it ok. What is wrong with my Kernel code vecAdd<<<1,5>>>(Ad,Bd,Cd); ?

Thanks for helping a newbie.

// add two vectors
//

#include <stdio.h>
#include <cuda.h>

// Kernel that executes on the CUDA device
extern “C” void gpuAdd(double* A, double* B, double *C);

global void vecAdd(double* A, double* B, double* C)
{
int i = threadIdx.x;
if (i < 5)
C[i]=A[i]+B[i];
}

void gpuAdd(double* A, double* B, double *C)
{
double *Ad,*Bd,*Cd;

cudaMalloc((void**)&Ad,5*sizeof(double)); 
cudaMalloc((void**)&Bd,5*sizeof(double)); 
cudaMalloc((void**)&Cd,5*sizeof(double));
cudaMemcpy(Ad,A,5*sizeof(double),cudaMemcpyHostToDevice);     
cudaMemcpy(Bd,B,5*sizeof(double),cudaMemcpyHostToDevice); 
vecAdd<<<1,5>>>(Ad,Bd,Cd);
cudaMemcpy(C,Cd,5*sizeof(double),cudaMemcpyDeviceToHost);
cudaFree(Ad);cudaFree(Bd);cudaFree(Cd);

}

// main routine that executes on the host
int main(void)
{
double A[5]={1, 2, 3, 4, 5};
double B[5]={6, 7, 8, 9, 10};
double C[5];
gpuAdd(A,B,C);
}

LSChien · October 2, 2009, 2:55am

you code is O.K, both for “sm_10” and “sm_13”.

modify your main function as

[codebox]int main(void)

{

double A[5]={1, 2, 3, 4, 5};

double B[5]={6, 7, 8, 9, 10};

double C[5];

gpuAdd(A,B,C);



int N = 5 ;

int i ;



for(i = 0 ; i < N ; i++){

	printf("C[%d] = %f\n", i, C[i] );

}

} [/codebox]

what’s your output?

mike56 · October 2, 2009, 5:33pm

I get:

C[0] = 0.000000

C[1] = 0.000000

C[2] = 0.000000

C[3] = 0.000000

C[4] = 0.000000

This is very weird. Again I also put a breakpoint at the end so I just watched the variables.

Any ideas what would cause this error but still compiles/runs fine?

you code is O.K, both for “sm_10” and “sm_13”.

modify your main function as

[codebox]int main(void)

{
double A[5]={1, 2, 3, 4, 5};

double B[5]={6, 7, 8, 9, 10};

double C[5];

gpuAdd(A,B,C);



int N = 5 ;

int i ;



for(i = 0 ; i < N ; i++){

	printf("C[%d] = %f\n", i, C[i] );

}
} [/codebox]

what’s your output?

_Big_Mac · October 2, 2009, 5:45pm

You’re using doubles in the kernel, for that you need to compile with
-arch compute_13 -code sm_13
(perhaps only one of thse is required but I’m not sure which)

mike56 · October 2, 2009, 6:20pm

Thanks for the info!

I changed all the doubles → floats and the code works.

Now I just need to figure out where to put the flags in Visual Studio and also in matlab and I am golden. Any hints would be greatly appreciated :)

Topic		Replies	Views
cudaMemcpy don't work CUDA Programming and Performance	4	1791	July 3, 2015
help for my cuda code Teaching and Curriculum Support	2	3890	March 31, 2015
Beginer's question CUDA Programming and Performance	1	1108	April 17, 2009
My first program it doesn't behave as expected CUDA Programming and Performance	2	2493	July 19, 2009
cudaMemcpy Failing To Copy Variable From Device To Host Correctly CUDA Programming and Performance	3	2740	April 26, 2021
Unespected output for a basic program CUDA Programming and Performance	6	928	December 10, 2014
MyFirstCuda CUDA Programming and Performance	5	4197	February 11, 2010
Very simple CUDA program bad output CUDA Programming and Performance	3	760	July 3, 2017
kernel problem CUDA Programming and Performance	6	2756	August 15, 2008
compilation CUDA Programming and Performance	3	7870	March 25, 2010

Newbie: Super simple first CUDA program what's wrong?

Related topics