Alright, I don’t even know what to make of this anymore, I’ve tried simplifying this down as much as I can and I still can’t figure this out. Simple addition of two arrays. The first element returned is the correct sum of the two values. The second through n-th elements are random values, not sure where they’re coming from. I’m sure this is one of those “duh” moments but forgive me, I’m completely new to CUDA and parallel programming in general. Does anything jump out at anyone about this?
/*****************************************
MATRIX BUILDER
*****************************************/
void buildMatrix(float* matrixElements){
for(uint i = 0; i < 100; i++){
// generate a random value n { 0 < n < 9 }
matrixElements[ i ] = rand() % 10;
}
}
/*****************************************
CUDA MATRIX ADDER
*****************************************/
__global__ void addKernel(float* A, float* B, float* C) {
C[threadIdx.x] = A[threadIdx.x] + B[threadIdx.x];
}
/******************************************
MAIN PROGRAM
******************************************/
int main()
{
srand( time (NULL) );
// CREATE A
float elementsA[100];
float elementsB[100];
float elementsC[100];
// FILL WITH RANDOM ELEMENTS
buildMatrix(elementsA);
buildMatrix(elementsB);
// ALLOCATE THE ELEMENTS TO THE DEVICE
float* deviceElA;
float* deviceElB;
float* deviceElC;
cudaMalloc((void**) &deviceElA, sizeof(elementsA));
cudaMalloc((void**) &deviceElB, sizeof(elementsB));
cudaMalloc((void**) &deviceElC, sizeof(elementsC) * sizeof(float));
cudaMemcpy(deviceElA, elementsA, sizeof(elementsA), cudaMemcpyHostToDevice);
cudaMemcpy(deviceElB, elementsB, sizeof(elementsB), cudaMemcpyHostToDevice);
// DISPATCH TO THE KERNEL
addKernel <<<1, 100>>>(deviceElA, deviceElB, deviceElC);
// COPY THE VALUES BACK
cudaMemcpy(elementsC, deviceElC, sizeof(deviceElC), cudaMemcpyDeviceToHost);
// ITERATE THROUGH THE RESULTS
for(int i = 0; i < 100; i++){
cout<<i<<": "<<elementsA[i]<<" + "<<elementsB[i]<<" = "<<elementsC[i]<<endl<<endl;
cout.flush();
}
cudaFree(deviceElA);
cudaFree(deviceElB);
cudaFree(deviceElC);
return 0;
}
The whole thing compiles without complaint, and executes without error, except for returning garbage values, example below:
0: 2 + 5 = 7
1: 1 + 6 = 3.27853e-39
2: 1 + 2 = -1.55064
3: 7 + 8 = 3.47529e-39
4: 7 + 2 = 7.00649e-45
5: 8 + 4 = -2.91408e-05
6: 2 + 3 = 0
7: 9 + 0 = 0
8: 9 + 4 = 1.4013e-45
9: 4 + 7 = -2.91491e-05
.... (cut out the rest for sake of it all being the same)
Many thanks to anyone who can slap me across the back of the head and point out what I’m doing wrong… because I’m pretty confused and frustrated…