Two expressions of same mathematical semantic give different results

frankxyy · July 4, 2023, 4:23am

global void vector_add(float *out, float *a, float *b, float *c, float *d, int n) {
__half tmp1 = (__hmul(__float2half(a[0]), __float2half(b[0])));
__half tmp2 = (__hmul(__float2half(c[0]), __float2half(d[0])));
out[0] = (__half2float(tmp1) - __half2float(tmp2));

// out[0] = __half2float(__hsub(__hmul(__float2half(a[0]), __float2half(b[0])), 
//     __hmul(__float2half(c[0]), __float2half(d[0]))));

}

In the above code snippet, If I use temparory variable tmp1 and tmp2 to save intermediate results, the final result is different from the expression in just one statement which is commented in the code.

njuffa · July 4, 2023, 4:47am

This subtraction,

__half2float(tmp1) - __half2float(tmp2)

is a single-precision (FP32) subtraction, whereas this subtraction,

__hsub ( <fp16_expression_1>, <fp16_expression_2> )

is a half-precision (FP16) subtraction. For many values of <fp16_expression_1> , <fp16_expression_2> these are not equivalent. I would expect the first method to deliver more accurate results on average, especially when the magnitudes of the two expressions are quite different.

Beyond this specific issue, be aware that floating-point arithmetic is not associative, and that therefore mathematically equivalent expressions are generally not equivalent when evaluated with finite-precision floating-point arithmetic.

Furthermore, for the second expression the compiler is likely to contract the multiply-subtract sequence into an FP16 fused multiply-add (FMA) operation, leading to further numerical differences. This contraction does not apply to the first example since the products and the subtraction are computed with different precisions, with a type conversion in between. If the FMA is generated (use cuobjdump --dump-sass to find out), the average accuracy of the two variants is likely very similar (I have not thought through all potential cases). However, the results would still not be a bit-wise match to the first variant, since one of the products would be computed fully accurate, while the other would be rounded to FP16.

frankxyy · July 4, 2023, 6:04am

@njuffa Massive thanks for your reply. In fact I made a mistake when copying the code. In the original version of the first example, the substraction is calculated in fp16 presion and the version in this post is a debugging version which I tried to generate the same output with the second example. However, the results of the two examples are not identical either the precision of the substraction in the first example, as the difference is mainly caused by the precision of multiplication between mul and FMA.

njuffa · July 4, 2023, 6:34am

For debugging purposes, you could turn off FMA merging by using the command line switch -fmad=false with nvcc. However, in general FMA generation should remain enabled for best average accuracy and best performance.

frankxyy · July 6, 2023, 4:16am

@njuffa Really study a lot. Thank you very much!

Topic		Replies	Views
FMA precision issue CUDA Programming and Performance	9	19587	November 21, 2010
printf affecting result of subtraction CUDA Programming and Performance	2	1789	November 22, 2017
Precision is be influenced when adopting the __half(fp16) dataType CUDA Programming and Performance cuda , programming	1	489	July 6, 2023
Poor half performance CUDA Programming and Performance	13	2660	June 19, 2025
Why does device give wrong answer to simple math? CUDA Programming and Performance	5	2929	November 16, 2011
The Arithmetic difference in fortran cuda Legacy PGI Compilers	1	1805	March 27, 2012
When using the __hfma function, the Device and Host results differ by 1 ULP, which should be 0ulp in theory CUDA Programming and Performance cuda , kernel	7	313	November 28, 2024
Same code, same input, different results CUDA Programming and Performance cuda	5	509	September 6, 2023
half calculation generates incorrect result CUDA Programming and Performance	1	585	October 19, 2019
Precision Fail CUDA Programming and Performance	5	10609	March 10, 2011

Two expressions of same mathematical semantic give different results

Related topics