I am running a kernel which has computations involving very small values.
My problem is :
“Each thread runs a loop for 720 times.
There are two particular values which do not depend on the loop.
The gpu output differs if I put this inside the loop and calculate again and again.
Though the difference is of the ordeer of 10E-4, I am not sure why this is happening when the calculatio is same and the hardware is same.
Has somebody experienced a similar problem??”
All data-types are float
[GT640, CUDA 5.0]
Code1:
temVy = (float)((CudaParamsD.VxNum[1]/2 - j) - 0.5) * CudaParamsD.SizeObj[1];
temVx = (float)((i - CudaParamsD.VxNum[0]/2) + 0.5) * CudaParamsD.SizeObj[0];
for(int IndAng=0; IndAng < CudaParamsD.DetNum[0]; IndAng++)
{
---
---
}
Code2:
for(int IndAng=0; IndAng < CudaParamsD.DetNum[0]; IndAng++)
{
temVy = (float)((CudaParamsD.VxNum[1]/2 - j) - 0.5) * CudaParamsD.SizeObj[1];
temVx = (float)((i - CudaParamsD.VxNum[0]/2) + 0.5) * CudaParamsD.SizeObj[0];
--------
--------
}
Here i, j are global threadIds and not in any loop.