Newb question: Different results for dbg, emu and release

jonface · April 8, 2009, 5:53pm

OK I don’t understand the difference in results from the code at the bottom of this post. The results from,

nvcc -I /opt/NVIDIA_CUDA_SDK/common/inc/ Main.cu -deviceemu

are,

host: (2,2)

host: (2,2)

host: (2,2)

host: (2,2)

host: (2,2)

host: (2,2)

---------------

host: (4,4)

host: (4,4)

host: (4,4)

host: (4,4)

host: (4,4)

as I would expect. The results from

nvcc -I /opt/NVIDIA_CUDA_SDK/common/inc/ Main.cu -g -G

are,

host: (2,2)

host: (2,2)

host: (2,2)

host: (2,2)

host: (2,2)

---------------

host: (2,2)

host: (2,2)

host: (2,2)

host: (2,2)

host: (2,2)

Not as I’d expect. The results from just,

nvcc -I /opt/NVIDIA_CUDA_SDK/common/inc/ Main.cu

are,

host: (2,2)

host: (2,2)

host: (2,2)

host: (2,2)

host: (2,2)

---------------

host: (512,2)

host: (512,2)

host: (512,2)

host: (512,2)

host: (512,2)

So what have I done wrong?

Thanks very much.

#include <stdio.h>

#include <assert.h>

#include <cuda.h>

#include <cutil.h>

__global__ void testDouble2(double2* vaid, int N) {

	

	int tx = threadIdx.x;

	int ty = threadIdx.y;

	int bx = blockIdx.x;

	int by = blockIdx.y;

	//printf("(%i,%i)-(%i,%i)\n",bx,by,tx,ty);

	vaid[tx].x = vaid[tx].x + vaid[tx].x;

	vaid[tx].y = vaid[tx].y + vaid[tx].y;

}

int main(void) {

	double2 host[] = {

		make_double2(2, 2),

		make_double2(2, 2),

		make_double2(2, 2),

		make_double2(2, 2),

		make_double2(2, 2),

	};

	for (int i = 0; i < 5;i++){

	  printf("host: (%g,%g)\n", host[i].x, host[i].y);

	}

	printf("---------------\n");

	double2* device;

	cudaMalloc((void **) &device, sizeof(double2)*5);

	cudaMemcpy(device, &host,sizeof(double2)*5, cudaMemcpyHostToDevice);

	dim3 threads(5, 1);

	dim3 grid(1,1);

	testDouble2<<< grid,threads >>> (device, 5);

	

	cudaMemcpy(&host, device, sizeof(double2)*5, cudaMemcpyDeviceToHost);

	for (int i = 0; i < 5;i++){

	 printf("host: (%g,%g)\n", host[i].x, host[i].y);

	}

}

tmurray · April 8, 2009, 5:56pm

compile with -arch sm_13

jonface · April 8, 2009, 7:04pm

Thanks for taking the time to reply. I tried with that and got the same result. I realise now that my card (9500 GT) supports only compute capability 1.1 and double precision was implemented in 1.3.

I guess I’ll have to stick to single precision for now.

Is there no way to make the GPU treat doubles as floats without need to change the code? I also tried sm_11, sm_10, compute_10 and compute_11.

Thanks.

tmurray · April 8, 2009, 9:18pm

no, making the GPU treat doubles as floats is essentially meaningless.

jonface · April 9, 2009, 7:37am

I guess in some circumstances it might be as bad as treating them as ints.

Thanks for your help clearing this up.

Topic		Replies	Views
How to activate double-precision computation CUDA Programming and Performance	4	30457	September 14, 2009
Doble not working in device code! Is is not supported ? CUDA Programming and Performance	2	1202	February 19, 2009
Problem with running code with double precision values Double precision gives wrong result CUDA Programming and Performance	2	1259	August 28, 2009
Double precision numbers, emulation, and compute capability < 1.3 CUDA Programming and Performance	5	1839	August 11, 2009
worked fine for "int" "float" but NOT "double" CUDA Programming and Performance	13	5172	March 9, 2009
Another "results different from emulation and GPU" Data-parallel, reading and writing to g CUDA Programming and Performance	6	3952	May 5, 2009
Double precision in CUDA 2.3 CUDA Programming and Performance	5	38273	March 5, 2010
Using double precision in CUDA how to turn on double precision in CUDA CUDA Programming and Performance	2	3099	July 27, 2008
Double precision CUDA Programming and Performance	9	6525	June 7, 2009
discrepancy between results from emulation mode and actual device mode CUDA Programming and Performance	1	5507	November 25, 2009

Newb question: Different results for dbg, emu and release

Related topics