I’ve compiled a simple CUDA program below and it doesn’t seem to work. I’m using MS Visual Studio 2008 and CUDA 2.3 + Quadro FX 5800.
I put a watch on variable C and it gives C={0,0,0,0,0} as the end result.
If I change cudaMemcpy(C,Cd,5sizeof(double),cudaMemcpyDeviceToHost); to cudaMemcpy(C,Ad,5sizeof(double),cudaMemcpyDeviceToHost);
then C={1,2,3,4,5} so I think I have most of it ok. What is wrong with my Kernel code vecAdd<<<1,5>>>(Ad,Bd,Cd); ?
Thanks for helping a newbie.
// add two vectors
//
#include <stdio.h>
#include <cuda.h>
// Kernel that executes on the CUDA device
extern “C” void gpuAdd(double* A, double* B, double *C);
global void vecAdd(double* A, double* B, double* C)
{
int i = threadIdx.x;
if (i < 5)
C[i]=A[i]+B[i];
}
void gpuAdd(double* A, double* B, double *C)
{
double *Ad,*Bd,*Cd;
cudaMalloc((void**)&Ad,5*sizeof(double));
cudaMalloc((void**)&Bd,5*sizeof(double));
cudaMalloc((void**)&Cd,5*sizeof(double));
cudaMemcpy(Ad,A,5*sizeof(double),cudaMemcpyHostToDevice);
cudaMemcpy(Bd,B,5*sizeof(double),cudaMemcpyHostToDevice);
vecAdd<<<1,5>>>(Ad,Bd,Cd);
cudaMemcpy(C,Cd,5*sizeof(double),cudaMemcpyDeviceToHost);
cudaFree(Ad);cudaFree(Bd);cudaFree(Cd);
}
// main routine that executes on the host
int main(void)
{
double A[5]={1, 2, 3, 4, 5};
double B[5]={6, 7, 8, 9, 10};
double C[5];
gpuAdd(A,B,C);
}