Accuracy Issues with Tesla C870

The following source codes show different results when run on the CPU and the GPU respectively. In view of this, I would like to ask if the NVIDIA Tesla C870 follows the IEEE floating-point convention. If not, what convention does it follow. In other words, what is the reason for such a difference.

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx - cpu.c - xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

#include “stdio.h”

main(){
float n = 1.0;
int i;
for(i = 1; i <= 100; i++) printf("%d\t\t%42.41f\n", i, n *= 1.02);
}

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx - cpu.c - xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx - gpu.cu - xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

#include “stdio.h”

global void kernel(int i, float *d_n){
*d_n *= 1.02;
}

main(){
float n = 1.0, *d_n;
int i;
cudaMalloc((void **)&d_n, sizeof(float));
for(i = 1; i <= 100; i++){
cudaMemcpy(d_n, &n, sizeof(float), cudaMemcpyHostToDevice);
kernel <<< 1, 1 >>> (i, d_n);
cudaMemcpy(&n, d_n, sizeof(float), cudaMemcpyDeviceToHost);
printf("%d\t\t%42.41f\n", i, n);
}
}

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx - gpu.cu - xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

If you rewrite your code to be float-safe, you will get the same results.

Otherwise you are comparing a computation all done in single precision to one with a mix of double and single precision.

#include "stdio.h"

__global__ void kernel(int i, float *d_n){

*d_n *= 1.02;   //<- the nvcc compiler makes this 1.02f

}

main(){

float n = 1.0, *d_n;

float n_ref = 1.0f;

int i;

cudaMalloc((void **)&d_n, sizeof(float));

for(i = 1; i <= 100; i++){

cudaMemcpy(d_n, &n, sizeof(float), cudaMemcpyHostToDevice);

kernel <<< 1, 1 >>> (i, d_n);

cudaMemcpy(&n, d_n, sizeof(float), cudaMemcpyDeviceToHost);

printf("%d\t\t%42.41f\t%42.41f\n", i, n,n_ref*=1.02f);

}

}

The programming Guide is full of details where the GPU differs from IEEE standards.