I am a CUDA beginner who has successfully compiled and run several code samples using CUDA libraries such as CUFFT and CUBLAS. Lately, however, I have been trying to generate my own simple kernels and am repeatedly receiving nonsense values back after calling my kernels. That is–when I pass a parameter into a kernel, set its value in the kernel, then try to copy the results back to the host and read the values later, they are bogus. I have tried many different simple tutorial kernels that seem to work for most people online, but I always get nonsensical values. For example…

#define SIZE 10

// Kernel definition, see also section 4.2.3 of Nvidia Cuda Programming Guide

```
__global__ void vecAdd(float* A, float* B, float* C) {
```

// threadIdx.x is a built-in variable provided by CUDA at runtime

```
int i = threadIdx.x;
A[i]=0;
B[i]=i;
C[i] = A[i] + B[i];
```

}

int main () {

int N=SIZE;

```
float A, B, C;
float *devPtrA;
float *devPtrB;
float *devPtrC;
int memsize= SIZE * sizeof(float);
```

cudaMalloc((void**)&devPtrA, memsize);

```
cudaMalloc((void**)&devPtrB, memsize);
cudaMalloc((void**)&devPtrC, memsize);
cudaMemcpy(devPtrA, A, memsize, cudaMemcpyHostToDevice);
cudaMemcpy(devPtrB, B, memsize, cudaMemcpyHostToDevice);
// __global__ functions are called: Func<<< Dg, Db, Ns >>>(parameter);
vecAdd<<<1, N>>>(devPtrA, devPtrB, devPtrC);
cudaMemcpy(C, devPtrC, memsize, cudaMemcpyDeviceToHost);
```

for (int i=0; i<SIZE; i++)

```
printf("C[%d]=%f\n",i,C[i]);
```

cudaFree(devPtrA);

```
cudaFree(devPtrB);
cudaFree(devPtrC);
```

}

This is a fairly straightforward problem; the results should be:

C[0]=0.000000

```
C[1]=1.000000
C[2]=2.000000
C[3]=3.000000
C[4]=4.000000
C[5]=5.000000
C[6]=6.000000
C[7]=7.000000
C[8]=8.000000
C[9]=9.000000
```

However, my results are always random and generally look more like:

C[0]=nan

```
C[1]=-32813464158208.000000
C[2]=nan
C[3]=-27667211200843743232.000000
C[4]=34559834084263395806523272811251761152.000000
C[5]=9214363188332593152.000000
C[6]=nan
C[7]=-10371202300694685655937271382147072.000000
C[8]=121653576586393934243511643668480.000000
C[9]=-30648783863808.000000
```

So basically, when I pass parameters into a CUDA kernel with the intention of storing results within them to be copied back to the host, I tend to get out junk.

Any help would be greatly appreciated.

Thanks.