CUDA doesn't represent doubles as accurately as floats

I’m experimenting with precision in CUDA. I know the GPU is optimized for floating point, but it’s strange that I am seeing more accuracy with floats than with doubles.

Here’s my test code:

__global__ void kernel()
{
  const float S3f=sqrtf(3.0f);
  const double S3d=sqrt(3.0);
  
  float threef=S3f*S3f;
  double threed=S3d*S3d;

  if (0==threadIdx.x)
    printf("F = %0.20f  %0.20f\nD = %0.20lf %0.20lf\n",
	   S3f, threef, S3d, threed);
}

int main(int argc, char **argv)
{
  cudaError_t err;
  kernel<<<1,1>>>(); 
  err=cudaThreadSynchronize();
  if (cudaSuccess!=err) {
    printf("CUDA failure \n");
    exit(0);
  }
  return 0;
}

I compile with nvcc -arch=sm_30 -o fp fp.cu and run on my Kepler GPU.

The output:

F = 1.73205077648162841797  3.00000000000000000000
D = 1.73205080756887719318 2.99999999999999955591

Why does CUDA compute the 32 bit floating value more precisely than with 64 bit doubles?

In finite floating-point arithmetic, sqrt(x)*sqrt(x) != x for many values of x, even if the square root result is correctly rounded according to the IEEE-754 “round to nearest or even” rule. For another double-precision example, try the square root of 2.

Of course, for some combinations of input operand and precision, one gets lucky and the mathematical identity happens to hold. You identified one such case.

The square roots of numbers in the interval [1,4] fall into the interval [1,2]. In common binary floating-point formats, there are twice as many representable numbers in the interval [1,4] as there are in the interval [1,2]. The square root operation therefore maps two different inputs to the same output. Based solely on that line of thought it is clear that, in general, we cannot recover the original operand of a floating-point square root operation by squaring its result.

In fact, if you compute the square root using higher precision, you will find that the double precision result above is more accurate. Moreover, if you take the single precision square root, but square it in double precision, you get a result that is less accurate than squaring the double precision square-root.

I’ve posted a short Python notebook that gives some examples:

https://www.wakari.io/sharing/bundle/seibert/Floating%20Point%20Square-Root