Possible bug in cuda behaviour

I’m working on an algorithm but the output behavior is not as expected so I’m providing a snippet of code which highlights my issue.

global void TEST(void)
{
float4 array[2];
float4 temp;
float k = .1;
float k1=k*(1/sqrtf(2));

array[0].x = 2445;
array[0].y = 2446;
array[0].z = 2447;
array[0].w = 2448;
array[1].x = 2444;
array[1].y = 2445;
array[1].z = 2446;
array[1].w = 2447;

temp.w =     (array[0].w * k1) ;//+ (array[0].w * (1-k));
temp.z =     (array[0].z * k1) + (array[1].w * (1-k1));
temp.y =     (array[0].y * k1) + (array[1].z * (1-k1));
temp.x =     (array[0].x * k1) + (array[1].y * (1-k1));

printf("test %f %f %f %f \\n",temp.x,temp.y,temp.z,temp.w);     //test 2445.000244 2446.000000 2447.000000 173.099747 , temp.x should be 2445

}

Why is temp.x 2445.000244 and not 2445? is it a rounding bug in the hardware, why is the behaviour no the same for the previous two results temp.y and temp.z?

Kelly

Have you run the same code on the CPU in single precision?
This might also be a floating point precision issue and not a hardware specific issue.

One thing to know about FP32 precision is that has approximately 7 decimal digits of precision. 2445.000 already shows 7 significant digits, so any further printer digits that follow can no longer be representable accurately - they are potentially random.

CUDA offers host side definitions of float4, so CPU vs GPU discrepancies should be easy to check using the same code.

Be aware that the rounding result of sqrtf() might differ between CPU and GPU. All the nVidia hardware guarantees is a specific precision in ULPs (units of least precision). It does not necessarily make guarantees about being faithfully or exactly rounded.

the nvcc compuler option -prec-sqrt may have an influence on the precision of sqrt functions.

The cuda programming guide 12.6 says this about the ULP error for sqrtf(x):

Maximum ulp error 0 when compiled with -prec-sqrt=true. Otherwise 1 for compute capability ≥ 5.2 and 3 for older architectures

When posting code on these forums, please format it correctly. One possible method: Edit your post using the pencil icon below it. Select the code. Press the </> button at the top of the edit pane. Save your changes.

Please do that now, thanks.

Nothing untoward appears to be happening. I modified the code into a pure C++ program for the host compiler, and used clang with -ffp-model=strict to compile.

On 64-bit ARM:

test 2445.000244 2446.000000 2447.000000 173.099747

On 64-bit x86:

test 2445.000244 2446.000000 2447.000000 173.099747 

So this is just an example of normal fixed-precision floating-point rounding effects. float in particular is only accurate to 6 to 7 decimal digits, and within that limitation, the expected result matches the observed result. In particular, we have:

k = 0.07071068 1-k = 0.92928934
array[0].x * k = 172.88761902
array[0].y * (1-k) = 2272.11254883

The sum of the two products is 2445.00016785. The nearest available float encodings have the values 2445.0 and 2445.000244, of which the latter is closest to the sum, so that is chosen for the final result under the round-to-nearest-or-even rule.