using double in kernel Rob Farber's Dr.Dobb's code using double

mmaurier · December 25, 2008, 8:48pm

I’m trying to use the code provided by Rob Farber in the Dr.Dobb’s May 13, 2008 article “CUDA, Supercomputing” for the Masses: Part 3". The code works well with types int and float, but does not work (does not return the same numbers) with type double for host/device memory. I’m using xp 64, ms visual studio 2005, cuda 2.1; my video card is 9800 gx2.

[codebox]/* test Dr.Dobb’s reverseArray_multiblock.cu code

use FLOAT or DOUBLE array,
submit to nvidia cuda forum

*/

// includes, system

#include <stdio.h>

#include <math.h>

#include <assert.h>

// includes, project

#include <cutil_inline.h>

// Simple utility function to check for CUDA runtime errors

void checkCUDAError(const char* msg);

// Part3: implement the kernel

global void reverseArrayBlock(double *d_out, double *d_in)

{

int inOffset = blockDim.x * blockIdx.x; 

int outOffset = blockDim.x * (gridDim.x - 1 - blockIdx.x); 

int in = inOffset + threadIdx.x; 

int out = outOffset + (blockDim.x - 1 - threadIdx.x); 

d_out[out] = d_in[in];

}

int main( int argc, char** argv)

{

// pointer for host memory and size 

double *h_a; 

// pointer for device memory 

double *d_b, *d_a;

// define number of elements, grid, and block size 

int dimA = 256 * 1024; // 256K elements (1MB total)	

int numThreadsPerBlock = 256;

// Part 1: compute number of blocks needed based on 

// array size and desired block size 

int numBlocks = dimA / numThreadsPerBlock;

// allocate host and device memory 

size_t memSize = numBlocks * numThreadsPerBlock * sizeof(double); 

h_a = (double *) malloc(memSize); 

cudaMalloc( (void **) &d_a, memSize ); 

cudaMalloc( (void **) &d_b, memSize );

// Initialize input array on host 

for (int i = 0; i < dimA; i++)

{ 

	h_a[i] = rand() / (double)RAND_MAX; 

	if (i == 0 || i == dimA-1)

		printf("h_a[%d] %4.4f \n",i,h_a[i]);		

}

// Copy host array to device array 

cudaMemcpy( d_a, h_a, memSize, cudaMemcpyHostToDevice );

// launch kernel 

dim3 dimGrid(numBlocks); 

dim3 dimBlock(numThreadsPerBlock); 

reverseArrayBlock<<< dimGrid, dimBlock >>>( d_b, d_a );

// block until the device has completed 

cudaThreadSynchronize();

// device to host copy 

cudaMemcpy( h_a, d_b, memSize, cudaMemcpyDeviceToHost );

// Check for any CUDA errors 

checkCUDAError("memcpy");

// verify the data returned to the host is correct 

for (int i = 0; i < dimA; i++) 

{ 

	//assert(h_a[i] == dimA - 1 - i ); 		

	if (i == 0 || i == dimA-1)

		printf("h_a[%d] %4.4f \n",i,h_a[i]);

}



// free device memory 

cudaFree(d_a); 

cudaFree(d_b);

// free host memory 

free(h_a);

// If the program makes it this far, then the results are 

// correct and there are no run-time errors. Good work! 

printf("Correct!\n");

cudaThreadExit();

cutilExit(argc, argv);

return 0;

}

void checkCUDAError(const char *msg)

{

cudaError_t err = cudaGetLastError(); 

if( cudaSuccess != err) 

{ 

	fprintf(stderr, "Cuda error: %s: %s.\n", msg, cudaGetErrorString( err) ); 

	exit(EXIT_FAILURE); 

}

}

[/codebox]

I need to use double precision variables for a research project at school (physics department).

Thank you.

tmurray · December 25, 2008, 8:52pm

Only Compute 1.3-capable cards support double precision (so the GTX 260, 280 and Tesla C1060). The 9800 GX2 supports Compute 1.1, so it won’t work with doubles.

mmaurier · December 26, 2008, 1:51am

Thank you for your prompt response.

buj · January 3, 2009, 3:27pm

dear sir

this is buj… i am very much intrested to learn cuda… i know how to allocate memory and copy data from host to device for 1 DIMENSIONAL ARRAY… but i am confusing about 2 DIMENSIONAL ARRAYS . how to declare variables for Host and Device , how to allocate memory for host and device , how to copy data from host to device … plse can you give some idea for this A[1024][1024]… tel me how to aceese threads for this A[1024][1024] at a time

plse help me kindly…

Topic		Replies	Views
Code works with floats but not doubles CUDA Programming and Performance	4	5020	July 15, 2009
Double precision in CUDA 2.3 CUDA Programming and Performance	5	38199	March 5, 2010
Using Double Precision with GTX 260 double precision issue CUDA Programming and Performance	1	8084	July 30, 2009
Problems passing doubles to/from kernel - they become 0! CUDA Programming and Performance	2	1641	November 20, 2008
Issues with double precision support on GT200 CUDA Programming and Performance	0	3693	July 4, 2008
worked fine for "int" "float" but NOT "double" CUDA Programming and Performance	13	5027	March 9, 2009
Bug at Memcpy with double. CUDA Programming and Performance	6	5562	September 7, 2009
Wrong output in double precision CUDA Programming and Performance	20	6010	May 20, 2011
Issues with double precision support on GT200 CUDA Programming and Performance	7	2759	July 7, 2008
double precision on the GTX 280 CUDA Programming and Performance	2	5215	August 13, 2008

using double in kernel Rob Farber's Dr.Dobb's code using double

Related topics