Strange change in behaviour between float and double

petschge · March 31, 2009, 2:21pm

Hello,

the following testcase shows a really strange change in behaviuor:

#include <cutil_inline.h>

#include <stdio.h>

typedef float my_t;

__global__ void kill(my_t* X, int size) {

		for(int i = 0; i < size; i++) {

				for(int j = 0; j < size; j++) {

						X[i*size + j] = 0;

				}

		}

}

void run(my_t* A, int size) {

		my_t* devA;

		cudaMalloc((void**)&devA, sizeof(my_t) * size * size);

		cudaMemcpy(devA, A, sizeof(my_t) * size * size, cudaMemcpyHostToDevice);

		printf("before %f\n", A[0]);

		kill<<<1, 1>>>(devA, size);

		cudaMemcpy(A, devA, sizeof(my_t) * size * size, cudaMemcpyDeviceToHost);

		cudaFree(devA);

		printf("after %f\n", A[0]);

}

void init(my_t* A, int size) {

		for(int i = 0; i < size; i++) {

				for(int j = 0; j < size; j++) {

						A[i * size + j] = i + j + 1.0;

				}

		}

}

int main(int argc, char* argv[]) {

		int size = 3;

		my_t* A = (my_t*)malloc(sizeof(my_t)*size*size);

		if(A == NULL) {

				fprintf(stderr, "Can't alloc A\n");

				exit(1);

		}

		printf("Running with %d byte floating point numbers\n", sizeof(my_t));

		init(A, size);

		run(A, size);

		free(A);

}

nvcc --compiler-options -g -I/home/pkilian/cuda/include -I/home/pkilian/NVIDIA_CUDA_SDK/common/inc -o broken.cu.o -c broken.cu

g++ -g -O0 -o broken broken.cu.o -L/home/pkilian/cuda/lib -L/home/pkilian/NVIDIA_CUDA_SDK/lib -L/home/pkilian/NVIDIA_CUDA_SDK/common/lib/linux -lcudart -L/home/pkilian/cuda/lib -lcutil

./broken

Running with 4 byte floating point numbers

before 1.000000

after 0.000000

This is the behaviour I exspected. But if I change the “float” to “double” the following happens:

nvcc --compiler-options -g -I/home/pkilian/cuda/include -I/home/pkilian/NVIDIA_CUDA_SDK/common/inc -o broken.cu.o -c broken.cu

g++ -g -O0 -o broken broken.cu.o -L/home/pkilian/cuda/lib -L/home/pkilian/NVIDIA_CUDA_SDK/lib -L/home/pkilian/NVIDIA_CUDA_SDK/common/lib/linux -lcudart -L/home/pkilian/cuda/lib -lcutil

./broken

Running with 8 byte floating point numbers

before 1.000000

after 1.000000

A[0][0] is no longer set to 0!

This is with cuda toolkit 2.1, cuda sdk 2.1, g++ (GCC) 4.1.3 20080704 (prerelease) (Debian 4.1.2-25) on a quadcore 64 bit machine with Intel Core2 Quad CPU @ 2.40GHz and 4 GB of Ram. The graphics card is a GeForce GTX 295 from Gainward.

Did I miss a bug in my code? Is this a bug in Cuda? Or what’s going on here? Any pointers?

JustSpiffy · March 31, 2009, 2:30pm

You could throw some cuda error checks in there to see if maybe the kernel isn’t being called?

mfatica · March 31, 2009, 2:51pm

You need to add the flag " -arch sm_13" when you compile to enable double precision.

petschge · April 1, 2009, 7:05am

Hello,

Yes, this indeed fixes this perplexing bug.

But it leaves me with new questions: Why doesn’t the compiler tell me that “double” is only supported by hardware with 1.3 compute capabilities? Or at least generate code that fails at runtime with a clear message what I did wrong? Even a “I don’t know what you are talking about, lets just call exit” handling by the compiler would be better than the current state of affairs.

Before I continue to rant and start to use more colorful language: Thanks again for the quick response.

E.D_Riedijk · April 1, 2009, 11:25am

I think it is because of backwards compatibility. Early CUDA demoted doubles automatically to floats. So without the switch you get the old behaviour (so people do not need to rewrite their code).
As far as I remember, you also needed to add the flag --arch sm_11 when you wanted to use 1.1 hardware features when that hardware came out. This way the ‘burden’ is on the people using new stuff.

Ojiisan · April 1, 2009, 2:03pm

That must’ve been a really early version of CUDA. I used 1.1 for my research purposes back in the beginning of 2008 and there was no 1.3 devices out, so no double support. Even then running kernels in double mode did not work and there was no warning from the compiler.

E.D_Riedijk · April 1, 2009, 6:04pm

Hmm, then my memory serves me badly. At least I have always been very careful to keep everything float because it was stated numerous times that in future hardware double would be supported and your code could do strange things.

Topic		Replies	Views
float / double issue CUDA Programming and Performance	12	22100	December 31, 2010
This is driving me nuts! memory access problem.. CUDA Programming and Performance	5	2662	December 7, 2007
Arrays of structs in device memory CUDA Programming and Performance	5	1576	October 17, 2010
Double precision in CUDA 2.3 CUDA Programming and Performance	5	38171	March 5, 2010
problem with double precision unpredictable results Different run give differents errors or no error CUDA Programming and Performance	12	2784	September 10, 2010
Double precision numbers, emulation, and compute capability < 1.3 CUDA Programming and Performance	5	1720	August 11, 2009
CUDA double/float woes CUDA not denoting double prec types? CUDA Programming and Performance	1	5902	May 1, 2008
Using unified memory causes system crash CUDA Programming and Performance	28	5785	February 4, 2019
Is there a difference between GPU double precision and CPU double precision? CUDA Programming and Performance	14	10526	November 26, 2009
Cube computing difference in GPU and CPU? CUDA Programming and Performance	4	510	November 1, 2017

Strange change in behaviour between float and double

Related topics