I finally solved a long-lasting bug with my kernal, revealing in the process what looks like a nasty bug with __powf. Here’s the relevant code snippet:
float m0 = r*__powf(__sinf(theta+r),3.0f); float m1 = r*__powf(__cosf(theta-r),3.0f);
This produces clearly wrong results, possibly NaNs?
float m0 = __sinf(theta+r); float m1 = __cosf(theta-r); m0 = r*m0*m0*m0; m1 = r*m1*m1*m1;
The moment I replaced the code with this, everything started working like it was supposed to.
Actually, come to think of it, this is probably just the result of the __powf(x,y) being undefined (it would produce complex numbers) for negative x and non-integer y.