discrepancy between CPU and GPU after a division (accuracy issue)

ttart398 · May 21, 2015, 1:53pm

Hello all

I am currently migrating some CPU code on GPU and I am facing accuracy issue. I have this simple code
on GPU

__device__ float Simu_ExpoReduitGPU(float const a,
		                    float const b,
		                    float const p1,
		                    curandState * pState){
  float random, x, C;
  C=p1/(1-expf(-p1*b));
  float test = 1-expf(-p1*b) ;
  float testp1 = p1/test ;
  random=0.5;
  x=-(log(1-random*p1/C))/p1;
  printf("EXPO %f %f %f %f %f %f %f\n",x,p1,b,C,p1*b,test,testp1) ;
  return x;
}

On CPU:

float Simu_ExpoReduit(float const a, 
                      float const b, 
                      float const p1){ 
  float random, x, C;
  C=p1/(1-expf(-p1*b));
  float test = 1-expf(-p1*b) ;
  float testp1 = p1/test ;
  //float random= (float) rand()/RAND_MAX; Generateur C++ [0,1]
  random=0.5;
  x=-(log(1-random*p1/C))/p1;
  printf("EXPO %f %f %f %f %f %f %f\n",x,p1,b,C,p1*b,test,testp1) ;
  return x;
}

This piece of code is supposed to simulate an exponential law.
When I run this code on CPU I got the following result

EXPO 0.249519 0.015359 0.500000 2.007695 0.007679 0.007650 2.007695

on GPU
EXPO 0.249523 0.015359 0.500000 2.007679 0.007679 0.007650 2.007679

the variable testp1 has different values between CPU and GPU. In fact I will have accpeted some differences on the last digit, but in this case the two last digit are different ?
Is it normal ?

I have read
https://developer.nvidia.com/sites/default/files/akamai/cuda/files/NVIDIA-CUDA-Floating-Point.pdf

and

http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html

but I haven’t been able to decide if this diff is expected or not.

I am compiling with cuda 5.5.

My GPU is a GTX680 (CC 3.0)

My CPU is a Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
My architecture is x86_64
My kernel is : 3.2.68-server

Thank you very much for your help

Robert_Crovella · May 21, 2015, 2:38pm

Its probably better if you provide a complete, simplified test case.

The very first value you calculate in your function is C, and you are printing out C, and it seems to be the same as what you are reporting for testp1, and it is different between host and device.

You should provide a complete executable code, not just a function. Also in addition to the other information you hvae provided, indicate the exact compile command you are using,.

When I compile the following code, on linux, with CUDA 7, and run on a K40c, I get numbers that seem to match between host and device:

$ cat t742.cu
#include <stdio.h>

#define V1 0.79f
#define V2 1.274f

__host__ __device__ float test(float const b, float const p1){

  return p1/(1-expf(-p1*b));
}

__global__ void mykernel(){

  float C = test(V1, V2);
  printf("GPU = %f\n", C);
}

int main(){

  printf("CPU = %f\n", test(V1, V2));
  mykernel<<<1,1>>>();
  cudaDeviceSynchronize();
  return 0;
}
$ nvcc -arch=sm_35 -o t742 t742.cu
$ ./t742
CPU = 2.007914
GPU = 2.007914
$

njuffa · May 24, 2015, 11:06am

In general, simple comparison of CPU and GPU results to establish some sort of correctness measure is poor methodology, as either set of computations includes accumulated error, and thus may deviate from the true mathematical result, with no apparent indication which is closer. The standard adage about this is that “the man with one watch always know what time it is, the man with two watches can never be sure”.

A better way to gauge quality is frequently to compare to a higher precision reference. See also NVIDIA’s floating-point white paper for a discussion of the issues surrounding simple host to device comparisons: [url]https://developer.nvidia.com/sites/default/files/akamai/cuda/files/NVIDIA-CUDA-Floating-Point.pdf[/url]

The first thing you would want to examine is your compilation settings. If you compile the CUDA code with nvcc -arch=sm_30, single-precision division delivers correctly rounded results according to the IEEE-754 floating-point standard. If you specify flags like -prec-div=false, or -use_fast_math, a faster, approximate division will be substituted. Likewise, the optimization flag settings of your host compiler may cause division to be approximate rather than correctly rounded.

The differences in the final results may have nothing to do with the division per se. The host compiler may use a precision other than single-precision to evaluate the expression. Likewise the generation of FMA (fused multipy-add instructions) may improve accuracy on the GPU relative to the CPU.

A very likely source of numerical differences in this computation are the expf() results, as the accuracy of expf() is not governed by C/C++ standards; you can find the error bounds for CUDA math functions documented in an appendix of the CUDA C Programming Guide. The single-precision math functions in CUDA favor performance over accuracy based on typical use cases, nonetheless they typically have a maximum error of a few ulps. Small numerical differences may be amplified through further computation, e.g. through subtractive cancellation.

If you want to try a more accurate but slower expf() implementation than that provided by CUDA, check out the one I posted to Stackoverflow a small time back:

[url]c - Which exponentiation algorithms do CPU/programming languages use? - Stack Overflow

ttart398 · June 10, 2015, 7:17am

Hello

I know it is a little bit late but thank you for your help.
Finally, I have been able to run some few test and it looks like cuda is computing the good result in my case. The problem seems to be related to one of the CPU optimization.

Regards

Topic		Replies	Views
Why accuracy CPU and GPU not equal? CUDA Programming and Performance	6	10927	October 28, 2014
Why are the calculations different between CPU and GPU? CUDA Programming and Performance	2	840	February 7, 2020
CPU and GPU Floating point anomaly CUDA Programming and Performance	10	5622	November 10, 2013
GPU Code and CPU Code output not matching till machine precision (i.e. 13 decimals places) CUDA Programming and Performance	22	761	August 9, 2023
GPU/CPU precision comparison and Kernel instructions question CUDA Programming and Performance	5	676	April 4, 2017
Floats and floats... difference between CPU and GPU? CUDA Programming and Performance	12	14073	February 2, 2010
Is there a difference between GPU double precision and CPU double precision? CUDA Programming and Performance	14	10626	November 26, 2009
exp() gives too much impresision with float CUDA Programming and Performance	11	9526	June 8, 2009
Question about cuda exp() function CUDA Programming and Performance cuda	9	1540	January 6, 2023
CPU and GPU floating point calculations Results are different CUDA Programming and Performance	6	21925	August 7, 2010

discrepancy between CPU and GPU after a division (accuracy issue)

Related topics