Are double precision functions in CUDA MATH API only the copy-paste version of single precision func

lwan61c1t3 · June 28, 2014, 2:38am

Why I cannot see any difference between them? Am I doing anything wrong?

I tested them using the following code, and comparing the results with the value by math.h in C.

__global__ void addKernel(double *dev_c)
{
    dev_c[0] = pow  (2.70134219723423422342334134, 2.70134219723423422342334134);
    dev_c[1] = powf (2.70134219723423422342334134, 2.70134219723423422342334134);

}

int main()
{
	double c[2] = {0.0, 0.0};

	double *dev_c;

	cudaMalloc((void**)&dev_c, 2*sizeof(double));

    addKernel <<< 1, 1 >>> (dev_c);
    
	cudaMemcpy(c, dev_c, 2*sizeof(double), cudaMemcpyDeviceToHost);

	printf("CUDA math double precision: %.24f \n", c[0]);
	printf("CUDA math single precision: %.24f \n", c[1]);

	getchar();
}

The output is

CUDA math double precision:  14.650218963623047000000000
CUDA math single precision:  14.650218963623047000000000

Comparing with the pow() in math.h

C math.h double precision:  14.650221542435155000000000

Or, is there anything I failed to do, that renders CUDA double precision not working?

Many thanks in advance.

Robert_Crovella · June 28, 2014, 3:01am

What command line are you using to compile the code? If you compile for anything less than an sm_13 architecture, double will get demoted to float. If this is happening, the compiler will usually spit out a message to that effect.

When I compile like this:

nvcc -arch=sm_20 -o t449 t449.cu

and run this code on a cc2.0 device:

#include <stdio.h>
#include <math.h>

__global__ void addKernel(double *dev_c)
{
dev_c[0] = pow (2.70134219723423422342334134, 2.70134219723423422342334134);
dev_c[1] = powf (2.70134219723423422342334134, 2.70134219723423422342334134);

}

int main()
{
double c[2] = {0.0, 0.0};

double *dev_c;

cudaMalloc((void**)&dev_c, 2*sizeof(double));

addKernel <<< 1, 1 >>> (dev_c);

cudaMemcpy(c, dev_c, 2*sizeof(double), cudaMemcpyDeviceToHost);

printf("CUDA math double precision: %.24f \n", c[0]);
printf("CUDA math single precision: %.24f \n", c[1]);
printf("CPU                       : %.24f \n", pow  (2.70134219723423422342334134, 2.70134219723423422342334134));
return 0;
}

I get this:

CUDA math double precision: 14.650221542435154731265357
CUDA math single precision: 14.650218009948730468750000
CPU                       : 14.650221542435154731265357

If, OTOH, I compile the same code like this:

nvcc -o t449 t449.cu

I get output like this:

CUDA math double precision: 14.650218009948730468750000
CUDA math single precision: 14.650218009948730468750000
CPU                       : 14.650221542435154731265357

This still isn’t exactly what you have. I suspect the remaining differences are that you are running on a cc1x GPU and I am running on a cc2x GPU, and in that case there are (I think) library differences between single-precision functions (only). njuffa will come along at some point and straighten it all out. But it will help if you specify your compile command line and GPU.

There also appear to be some printf differences between yours and mine. I am running on linux, perhaps you are running on windows.

njuffa · June 28, 2014, 3:50am

You can easily inspect the source code for the CUDA math library by looking at the header files math_functions.h and math_functions_double_ptx3.h that are part of your CUDA installation. You will readily observe that while there are often similarities in the design of single-precision and double-precision version of a given math function, they are not direct copies.

Differences in the results of single-precision math functions between sm_1x and later architectures noted by txbob are primarily due to the availability of FMA (IEEE-754 compliant fused-multiply add) in the latter, while sm_1x only offers a similar, but numerically inferior, FMAD instruction. For some single-precision math functions, the availability of FMA has also prompted a re-design. You can readily observe this in the source code where there are code sections #ifdef’ed based on CUDA_ARCH.

While the availability of FMA has made some single-precision math functions more accurate on sm_20 and later architectures, the error bounds in the CUDA documentation reflect the higher error bound on sm_1x in such cases, that is, the documentation states the worst case errors observed across all supported platforms.

lwan61c1t3 · June 28, 2014, 4:47am

Thank you so much. Yes, that is the problem!

I moved to Visual Studio 2010 with Nsight 4.0 from GCC recently, but forgot it is “compute_10, sm_10” only by default.

I develop code on my laptop (NVS 5400M) and run it on K20.

txbob:

What command line are you using to compile the code? If you compile for anything less than an sm_13 architecture, double will get demoted to float. If this is happening, the compiler will usually spit out a message to that effect.

When I compile like this:

nvcc -arch=sm_20 -o t449 t449.cu

and run this code on a cc2.0 device:
#include <stdio.h>
#include <math.h>

__global__ void addKernel(double *dev_c)
{
dev_c[0] = pow (2.70134219723423422342334134, 2.70134219723423422342334134);
dev_c[1] = powf (2.70134219723423422342334134, 2.70134219723423422342334134);

}

int main()
{
double c[2] = {0.0, 0.0};

double *dev_c;

cudaMalloc((void**)&dev_c, 2*sizeof(double));

addKernel <<< 1, 1 >>> (dev_c);

cudaMemcpy(c, dev_c, 2*sizeof(double), cudaMemcpyDeviceToHost);

printf("CUDA math double precision: %.24f \n", c[0]);
printf("CUDA math single precision: %.24f \n", c[1]);
printf("CPU                       : %.24f \n", pow  (2.70134219723423422342334134, 2.70134219723423422342334134));
return 0;
}
I get this:
CUDA math double precision: 14.650221542435154731265357
CUDA math single precision: 14.650218009948730468750000
CPU                       : 14.650221542435154731265357
If, OTOH, I compile the same code like this:

nvcc -o t449 t449.cu

I get output like this:
CUDA math double precision: 14.650218009948730468750000
CUDA math single precision: 14.650218009948730468750000
CPU                       : 14.650221542435154731265357
This still isn’t exactly what you have. I suspect the remaining differences are that you are running on a cc1x GPU and I am running on a cc2x GPU, and in that case there are (I think) library differences between single-precision functions (only). njuffa will come along at some point and straighten it all out. But it will help if you specify your compile command line and GPU.

There also appear to be some printf differences between yours and mine. I am running on linux, perhaps you are running on windows.

lwan61c1t3 · June 28, 2014, 4:53am

Hi, njuffa,

I have already got my problem fixed with the help of txbob. Thanks for your suggestions. ^_^

njuffa:

You can easily inspect the source code for the CUDA math library by looking at the header files math_functions.h and math_functions_double_ptx3.h that are part of your CUDA installation. You will readily observe that while there are often similarities in the design of single-precision and double-precision version of a given math function, they are not direct copies.

Differences in the results of single-precision math functions between sm_1x and later architectures noted by txbob are primarily due to the availability of FMA (IEEE-754 compliant fused-multiply add) in the latter, while sm_1x only offers a similar, but numerically inferior, FMAD instruction. For some single-precision math functions, the availability of FMA has also prompted a re-design. You can readily observe this in the source code where there are code sections ifdef’ed based on CUDA_ARCH.

While the availability of FMA has made some single-precision math functions more accurate on sm_20 and later architectures, the error bounds in the CUDA documentation reflect the higher error bound on sm_1x in such cases, that is, the documentation states the worst case errors observed across all supported platforms.

Topic		Replies	Views
Double precision Accuracy with sqrt, log math functions Results on CPU & GPU are not exactly sam CUDA Programming and Performance	9	5483	April 12, 2012
Double Precision Help... Double precision CUDA Programming and Performance	6	5121	September 1, 2011
A doubt... 64- 32- and 24-bit math... CUDA Programming and Performance	4	3361	September 23, 2009
Help with 'double precision' CUDA Programming and Performance	8	2924	July 29, 2008
FMA precision issue CUDA Programming and Performance	9	19444	November 21, 2010
Reporting a problem while using double precision in cuda 2.0 double precision has a strange behavor& CUDA Programming and Performance	6	6889	December 22, 2008
double precision differences differnces in precision of values when compared to matlab CUDA Programming and Performance	4	919	March 2, 2012
Problem with running code with double precision values Double precision gives wrong result CUDA Programming and Performance	2	1212	August 28, 2009
Expected performance of double precision arithmetic CUDA Programming and Performance	8	4042	August 20, 2009
Emulated double precision Double single routine header CUDA Programming and Performance	24	49277	October 18, 2010

Are double precision functions in CUDA MATH API only the copy-paste version of single precision func

Related topics