Hi,
Following is the code and output of a program (attached screenshot)that prints the square of first 1000 integers (0 to 999). The desired out is square of 0 to 999 Number. But this program produces correct Square of only 0 to 511, after which the results are garbage values (-1163005939) See the attached screenshot. I am running the program on Visual C++ Compiler on windows XP, in emulation mode. I have declared two blocks with 512 thread per block as can be seen in the following code.
Could you please tell me why I am getting wrong results for integers above 511?
Thanking in advance ,
Deepak
__global__ void Squar(unsigned int *p)
{
unsigned int i=threadIdx.x;
p[i]=i*i;
}
int main()
{
unsigned int i,*h,*q;
const unsigned int p=10000;
size_t size=p*sizeof(unsigned int);
h=( unsigned int *)malloc(size);
cudaMalloc((void**)&q,size);
cudaMemcpy(q,h,size,cudaMemcpyHostToDevice);
Squar<<<2,512>>>(q);
cudaMemcpy(h,q,size,cudaMemcpyDeviceToHost);
for(i=0;i<1000;i++)
{
printf("\n%ld",h[i]);
}
getch();
free(q);
free(h);
return 0;
}