I have a recursive floating point calculation done inside a kernel that is producing the wrong results, same code as the cpu version, different results up to a relative error of 1e-3 which is way too much and produces a lot of noise in the output.
Any idea on what may be causing this? I’m suspecting mixed integer float operations issues but not sure
fast math is not enabled.
One of the functions showing the problems (the shorter one) (each thread has it’s own global memory buffer for the calculation and thus the use of step here)
device void genlgp(float theta, int nc, float *pnmllg, size_t step)
float costh = cosf(theta);
pnmllg = 0.0f; pnmllg[step] = 1.0f; for (int n = 2 ; n < nc ; n++) pnmllg[n*step] = ((2.0f*n - 1.0f)*costh*pnmllg[(n - 1)*step] - n*pnmllg[(n - 2)*step])/(n - 1.0f);
Glad for any suggestions