Efficient calculation of the angle between two vectors in a 3D space

I need to calculate the angle between two vectors in a 3D space as well as its sine and cosine.

For the moment, I’m using the following approach. Let r1 and r2 be the two 3D vectors.

magr1 = invmagnitude(r1);
magr2 = invmagnitude(r2);

costheta = dot(r1,r2)*(magr1*magr2);

theta = acos(costheta1);
sintheta = sin(theta1);

where the dot function computes

r1.x*r2.x + r1.y*r2.y + r1.z*r2.z;

and invmagnitude calculates

rsqrt(dot(vec, vec));

My question is: is there an approach more efficient than what I’m currently doing?

I understand that the answer may depend on the target precision. Then, how the answer to the above question changes with precision?

Thank you very much in advance.

Learn to use nvprof.

Examine cache hits, branch and warp execution efficiencies, etc.

This way, you can pinpoint any flaws in your particular implementation.

I am quite rusty, but as I recall one can compute the sine and cosine components directly:

cos(theta) = dot(a,b) / |a| * |b|
sin(theta) = |cross(a,b)| / |a| * |b|

Then use atan2() to map back to the full circle. Not sure whether that gives you the smaller or larger of two possible angles, and whether it will be properly signed (if that is required), so additional minor corrections may be necessary.

The magnitude computations would want to use rhypot() is you need robustness, otherwise direct use of rsqrt() will give you the best performance:


Thank you very much njuffa.

rhypot() works with two arguments and I have 3D vectors. To use it in my case I should either

  • calculate the plane containing the two vectors (so reducing the dimensionality of the problem to 2D) and then using rhypot() or
  • using something like
temp = 1/rhypot(y*y+z*z);
result = rhypot(x*x + temp*temp);

Am I right?

Sorry, I wasn’t thinking when I recommended rhypot(). That works great for 2D, but for 3D you need something like rhypot (a, hypot(b, c)) which is robust but could be too slow.

My main focus was to get you down from two transcendental function calls to one. I know that for double precision, acos() and atan2() have roughly the same throughput on current Teslas, so that seemed like a good tradeoff.

So computing sine and cosine first gives us a unit vector, and using the atan2() of that should give us the angle (possibly modulo minor corrections as mentioned). If you would not need the sine and cosine, but just the angle itself my approach could be even more efficient since we do not need the two divisions by |a|*|b| in that case (as they cancel) leaving us with:

theta = atan2 (norm(cross(a,b)), dot(a,b))

An interesting side-problem here is how to compute norm(cross(a,b)) with the minimal number of operations, by making maximal use of FMA. Is there a better way than the naïve method, while being numerically stable? I have not looked into that yet.

cos a = (XaXb+YaYb+ZaZb) / sqrt((Xa²+Ya²+Za²)(Xb²+Yb²+Zb² ))

Well, that is the naïve implementation I referenced. The question is, can one do better than this?

In terms of accuracy and robustness use of the functions hypot() and rhypot() will help. As for the cross product, it is made up of terms of the form ab - cd which can suffer from cancellation even when FMA is used in the obvious way. A more accurate evaluation using FMAs (at the cost of doubling the number of required operations) is mentioned in this recent paper:


// compute ab - cd accurately
w = __dmul_rn (c, d);
e = __fma_rn (c, d, -w);
f = __fma_rn (a, b, +w);
r = __dadd_rn (f + e);