Problems with doubles on GTX280 Emu works, float works, double on device fails.

tohahn · November 11, 2008, 3:17pm

Hi all,

I am implementing some basic functions for sparse matrices. In order to have one memory transfer for a CSR formatted matrix I use a struct of pointers.

After getting strange results I narrowed the problem down to copying double data.

The following test case works for me in emulation mode (single and double precision) in single precision on the device but not as it is presented here on the device.

#include <stdio.h> 

#define dim 7

typedef struct {   //[int,int,int,dim*double,dim*int]

	void* memory;	

	int* Int1;	  

	int* Int2;	

	int* Int3;	

	double* dblVector;

	int* intVector;	 

} myStruct;

__host__ void setPointers(myStruct* A)

{

	(*A).Int1 = (int*)(*A).memory;

	(*A).Int2   = (*A).Int1+1;

	(*A).Int3   = (*A).Int2+1;

	(*A).dblVector	= (double*) ((*A).Int3+1);

	(*A).intVector	= (int*) ((*A).dblVector+dim);

}

__global__ void spmv_gpu(int* intVector, double* dblVector, double* dblResult, int* intResult)

{

	dblResult[threadIdx.x] = dblVector[threadIdx.x];

	intResult[threadIdx.x] = intVector[threadIdx.x];

}

int main(int argc,char* argv[])

{

	cudaSetDevice(0);

	int i;

//Create a struct on the device

	myStruct A_d;

	cudaMalloc((void**)&(A_d.memory), sizeof(double)*dim+sizeof(int)*(3+dim));

	setPointers(&A_d);

//Create a struct on the host

	myStruct A_h;

	cudaMallocHost((void**)&(A_h.memory), sizeof(double)*dim+sizeof(int)*(3+dim));

	setPointers(&A_h);

	

//...fill it with dummy data

	*A_h.Int1=11;	*A_h.Int2=21;	*A_h.Int3=31;

	for (i=0; i<dim;i++) A_h.dblVector[i]=sqrt(i);

	for ( i=0; i<dim;i++) A_h.intVector[i]=(i+4)*10+1;

	

//print this data

	for(i = 0;i < dim; i++){

	 	   printf("%d: %e\n",A_h.intVector[i], A_h.dblVector[i]);

	  }

	printf("\n");

//copy data to device	

	cudaMemcpy(A_d.memory, A_h.memory, sizeof(double)*dim+sizeof(int)*(3+dim), cudaMemcpyHostToDevice);

//copy contents of dummy vectors to other areas of memory 			

	double* dblResult_d;

	int* intResult_d;

	cudaMalloc((void**)&dblResult_d, sizeof(double)*dim);

	cudaMalloc((void**)&intResult_d, sizeof(int)*dim);

	spmv_gpu<<<1,dim>>>(A_d.intVector, A_d.dblVector, dblResult_d, intResult_d);

//read the results back

	double dblResult[dim]={0.0};

	int intResult[dim]={0};	

	cudaMemcpy(dblResult, dblResult_d, sizeof(double)*dim, cudaMemcpyDeviceToHost);

	cudaMemcpy(intResult, intResult_d, sizeof(int)*dim, cudaMemcpyDeviceToHost);

	

//print the results

 	for(i = 0;i < dim; i++){

			 printf("%d: %e\n", intResult[i], dblResult[i]);

	  }

	cudaFreeHost(A_h.memory);

	cudaFree(A_d.memory);

	cudaFree(dblResult_d);

	cudaFree(intResult_d);

}

The result on both my ASUS ENGTX280 is

41: 0.000000e+00

51: 1.000000e+00

61: 1.414214e+00

71: 1.732051e+00

81: 2.000000e+00

91: 2.236068e+00

101: 2.449490e+00

41: 1.531604e-322

51: 0.000000e+00

61: 5.308593e+185

71: -4.434596e+194

81: 5.303601e-315

91: -9.458696e-176

101: 3.732320e-212

So the integer data is copied correctly, double is not. Am I doing something wrong?

I’m using driver version 177.73 on OpenSuse 10.2 (kernel 2.6.26.5) and am compiling with -arch sm_13 and -m64.

Thanks in advance for any help.

–Tobias

Edits:

PTX looks fine, it loads f64 from the correct address and stores it also to f64.
It works if I leave out one of the three integers, let’s say Int2, set the pointer to Int3 new and transfer 4 bytes less to and from the device. Strange?!

tmurray · November 11, 2008, 5:09pm

did you compile with the -arch sm_13 option?

tohahn · November 11, 2008, 5:21pm

Of course, I did. :)

tohahn · November 11, 2008, 5:33pm

I tested now with 2 and 4 integers preceding the double vector and it worked.
In the version with 3 ints, the first double is at 0xc and produces wrong results, the 0x0 and 0x***8 above work fine.

Could it be, that double values are only accessible if their addresses are divisible by 8?

Edit: I checked 1 and 5 preceding ints meanwhile and got errors again. Only even numbers of preceding ints seem to work.

tmurray · November 11, 2008, 7:50pm

doubles have to be 8-byte aligned, just like in host code.

tohahn · November 11, 2008, 8:09pm

For Linux it’s 4-bytes only, this is why the emulation worked all the time.

Thanks, the problem will now be easy to solve.

Topic		Replies	Views
This is driving me nuts! memory access problem.. CUDA Programming and Performance	5	2703	December 7, 2007
Problems passing doubles to/from kernel - they become 0! CUDA Programming and Performance	2	1658	November 20, 2008
Double precision numbers, emulation, and compute capability < 1.3 CUDA Programming and Performance	5	1773	August 11, 2009
double precision on the GTX 280 CUDA Programming and Performance	2	5226	August 13, 2008
Newb question: Different results for dbg, emu and release CUDA Programming and Performance	4	1404	April 9, 2009
Arrays of structs in device memory CUDA Programming and Performance	5	1632	October 17, 2010
Double precision PRoblem CUDA Programming and Performance	15	12178	January 13, 2009
Doble not working in device code! Is is not supported ? CUDA Programming and Performance	2	1172	February 19, 2009
array of structure problem CUDA Programming and Performance	7	6016	September 22, 2009
Code works with floats but not doubles CUDA Programming and Performance	4	5045	July 15, 2009

Problems with doubles on GTX280 Emu works, float works, double on device fails.

Related topics