Hello guys,
I ran into a strange discovery and I was wondering if anyone could offer an explanation. I wrote a simple code that initializes an array in host memory A=[1,2,3,4,5,6,7,8,9,10]
and it performs the necessary steps in calling a kernel for the device to double every element in the array to A =[2,4,6,8,10,12,14,16,18,20]. Now if all of the variables, pointers, and arrays are of type double the code will seem to run, but the kernel changes nothing. If all of the variables are type float, on the other hand, the program works correctly:
The code is:
[codebox]#include <stdio.h>
#include “cuda/cuda.h”
#define BLOCK_SIZE 256
#define SIZE 10
global void ones(float *BxyzG);
int main()
{
float *Bxyz, BxyzA[10];
float *BxyzG;
int i=0;
for(i=0;i<SIZE;i++)
{
BxyzA[i]=(float)(i+1);
}
Bxyz = (float*) malloc(SIZE*sizeof(float));
Bxyz=BxyzA;
for(i=0;i<SIZE;i++)
{
printf(" Defrefrencing Host Pointer: Bxyz[%d] = %f \n", i, *(Bxyz+i));
}
cudaSetDevice(0);
cudaMalloc((void **)&BxyzG, sizeof(float)*SIZE);
cudaMemcpy(BxyzG, Bxyz, sizeof(float)*SIZE, cudaMemcpyHostToDevice);
dim3 dimBlock(BLOCK_SIZE);
dim3 dimGrid ( (SIZE/dimBlock.x) + (!(SIZE%dimBlock.x)?0:1) );
printf(“dimBlock.x: %d dimGrid.x: %d\n”, dimBlock.x, dimGrid.x);
ones<<<dimGrid, dimBlock>>>(BxyzG);
cudaMemcpy(Bxyz, BxyzG, sizeof(float)*SIZE, cudaMemcpyDeviceToHost);
for(i=0;i<SIZE;i++)
{
printf("Bxyz[%d] = %f ", i, Bxyz[i]);
}
return 0;
}
global void ones(float *BxyzG)
{
int i = threadIdx.x;
if(i<SIZE)
{
BxyzG[i] *=2;
}
}[/codebox]
RESULT WHEN THERE ARE DOUBLES:
[codebox][brose@ucla-grendel Desktop]$ ./a.out
Defrefrencing Host Pointer: Bxyz[0] = 1.000000
Defrefrencing Host Pointer: Bxyz[1] = 2.000000
Defrefrencing Host Pointer: Bxyz[2] = 3.000000
Defrefrencing Host Pointer: Bxyz[3] = 4.000000
Defrefrencing Host Pointer: Bxyz[4] = 5.000000
Defrefrencing Host Pointer: Bxyz[5] = 6.000000
Defrefrencing Host Pointer: Bxyz[6] = 7.000000
Defrefrencing Host Pointer: Bxyz[7] = 8.000000
Defrefrencing Host Pointer: Bxyz[8] = 9.000000
Defrefrencing Host Pointer: Bxyz[9] = 10.000000
dimBlock.x: 256 dimGrid.x: 1
Bxyz[0] = 1.000000 Bxyz[1] = 2.000000 Bxyz[2] = 3.000000 Bxyz[3] = 4.000000 Bxyz[4] = 5.000000 Bxyz[5] = 6.000000 Bxyz[6] = 7.000000 Bxyz[7] = 8.000000 Bxyz[8] = 9.000000 Bxyz[9] = 10.000000 [/codebox]
THIS IS WRONG because unchanged but when I literally use “replace with” to replace double with float:
[codebox] Defrefrencing Host Pointer: Bxyz[0] = 1.000000
Defrefrencing Host Pointer: Bxyz[1] = 2.000000
Defrefrencing Host Pointer: Bxyz[2] = 3.000000
Defrefrencing Host Pointer: Bxyz[3] = 4.000000
Defrefrencing Host Pointer: Bxyz[4] = 5.000000
Defrefrencing Host Pointer: Bxyz[5] = 6.000000
Defrefrencing Host Pointer: Bxyz[6] = 7.000000
Defrefrencing Host Pointer: Bxyz[7] = 8.000000
Defrefrencing Host Pointer: Bxyz[8] = 9.000000
Defrefrencing Host Pointer: Bxyz[9] = 10.000000
dimBlock.x: 256 dimGrid.x: 1
Bxyz[0] = 2.000000 Bxyz[1] = 4.000000 Bxyz[2] = 6.000000 Bxyz[3] = 8.000000 Bxyz[4] = 10.000000 Bxyz[5] = 12.000000 Bxyz[6] = 14.000000 Bxyz[7] = 16.000000 Bxyz[8] = 18.000000 Bxyz[9] = 20.000000 [/codebox]
This works and I have no idea why. I am using Tesla C870.