CUDA doesn't represent doubles as accurately as floats

nerdkit · July 9, 2013, 1:20am

I’m experimenting with precision in CUDA. I know the GPU is optimized for floating point, but it’s strange that I am seeing more accuracy with floats than with doubles.

Here’s my test code:

__global__ void kernel()
{
  const float S3f=sqrtf(3.0f);
  const double S3d=sqrt(3.0);
  
  float threef=S3f*S3f;
  double threed=S3d*S3d;

  if (0==threadIdx.x)
    printf("F = %0.20f  %0.20f\nD = %0.20lf %0.20lf\n",
	   S3f, threef, S3d, threed);
}

int main(int argc, char **argv)
{
  cudaError_t err;
  kernel<<<1,1>>>(); 
  err=cudaThreadSynchronize();
  if (cudaSuccess!=err) {
    printf("CUDA failure \n");
    exit(0);
  }
  return 0;
}

I compile with nvcc -arch=sm_30 -o fp fp.cu and run on my Kepler GPU.

The output:

F = 1.73205077648162841797  3.00000000000000000000
D = 1.73205080756887719318 2.99999999999999955591

Why does CUDA compute the 32 bit floating value more precisely than with 64 bit doubles?

njuffa · July 9, 2013, 3:03am

In finite floating-point arithmetic, sqrt(x)*sqrt(x) != x for many values of x, even if the square root result is correctly rounded according to the IEEE-754 “round to nearest or even” rule. For another double-precision example, try the square root of 2.

Of course, for some combinations of input operand and precision, one gets lucky and the mathematical identity happens to hold. You identified one such case.

The square roots of numbers in the interval [1,4] fall into the interval [1,2]. In common binary floating-point formats, there are twice as many representable numbers in the interval [1,4] as there are in the interval [1,2]. The square root operation therefore maps two different inputs to the same output. Based solely on that line of thought it is clear that, in general, we cannot recover the original operand of a floating-point square root operation by squaring its result.

seibert · July 9, 2013, 2:46pm

In fact, if you compute the square root using higher precision, you will find that the double precision result above is more accurate. Moreover, if you take the single precision square root, but square it in double precision, you get a result that is less accurate than squaring the double precision square-root.

I’ve posted a short Python notebook that gives some examples:

[url]https://www.wakari.io/sharing/bundle/seibert/Floating%20Point%20Square-Root[/url]

Topic		Replies	Views
Double precision Accuracy with sqrt, log math functions Results on CPU & GPU are not exactly sam CUDA Programming and Performance	9	5483	April 12, 2012
sqrt precision CUDA Programming and Performance	3	19464	September 9, 2011
float / double issue CUDA Programming and Performance	12	22145	December 31, 2010
Is there a difference between GPU double precision and CPU double precision? CUDA Programming and Performance	14	10902	November 26, 2009
Precision of floats does CUDA use half precision instead of single precision for floats? CUDA Programming and Performance	5	2317	March 15, 2010
Precision issue! Wrong result for a multiplication CUDA Programming and Performance	7	1390	April 11, 2012
Single Precision Accuracy CUDA Programming and Performance	9	9204	October 6, 2010
CUDA innacuracy? CUDA float produces different result from CPU float CUDA Programming and Performance	8	3076	September 9, 2011
A doubt... 64- 32- and 24-bit math... CUDA Programming and Performance	4	3361	September 23, 2009
Expected performance of double precision arithmetic CUDA Programming and Performance	8	4042	August 20, 2009

CUDA doesn't represent doubles as accurately as floats

Related topics