Double precision in CUDA 2.3

fivos_rg · August 11, 2009, 5:59pm

Hi to everyone,

I am new to CUDA computing and I would like to ask a question the more experienced members about double precision in the CUDA 2.3 environment. First of all I have a GTX295 card which, according to the CUDA programming guide, supports compute capability 1.3, meaning that it supports double precision numbers. Also I am using the nvcc compiler through the Microsoft developer studio 2008 (actually the code is built using a one of the SDK sample programs as a template).

So I was experiencing with a simple code, which can be found here ( [url=“My first CUDA program! | /// Parallel Panorama ///”]http://llpanorama.wordpress.com/2008/05/21...t-cuda-program/[/url] ), I have modified it a bit as following (in short I have changed the way the matrix a_h is allocated and the array elements that have to be calculated):

#include <stdio.h>
#include <cuda.h>

// Kernel that executes on the CUDA device
global void square_array(float *a, int N)
{
int idx = blockIdx.x * blockDim.x + threadIdx.x;
a[idx] = a[idx] * a[idx];
}

// main routine that executes on the host
int main(void)
{

float *a_h=new float 60;
float *a_d; // Pointer to host & device arrays
const int N = 60; // Number of elements in arrays
size_t size = N * sizeof(float);
// a_h = (float )malloc(size); // Allocate array on host
cudaMalloc((void **) &a_d, size); // Allocate array on device
// Initialize host array and copy it to CUDA device
for (int i=0; i<N; i++) a_h[i] = (float)i2;
cudaMemcpy(a_d, a_h, size, cudaMemcpyHostToDevice);
// Do calculation on device:
int nblocksize=500;
int Nblocks=N/nblocksize+ (N%nblocksize == 0 ? 0 : 1);
square_array <<< Nblocks, nblocksize >>> (a_d, N);
// Retrieve result from device and store it in host array
cudaMemcpy(a_h, a_d, size, cudaMemcpyDeviceToHost);
// Print results
for (int i=0; i<N; i++) printf(“%d %f\n”, i, a_h[i]);
// Cleanup
delete a_h; cudaFree(a_d);
}

So when I run the executable the program works as it should since it gives the correct results. But when I replace all float declarations to double, then I get a warning :

warning : Double is not supported. Demoting to float

Also the results are wrong (to be precise it is like the array elements were never processed by the device, since what is printed is just the elements intialized by the host). Why do I get the warning and the error results ? Do I miss anything here?

Thanks in advance.

tmurray · August 11, 2009, 6:25pm

compile with -arch sm_13

fivos_rg · August 11, 2009, 6:46pm

You are absolutely right I forgot to enable 1.3 compute architecture from the compiler. Thank you very much for your quick response!

External Media

Hooiberg · February 11, 2010, 7:46pm

I get the same error.

I specifically bought a device with support for doubles, a GTX 260

In my common.mk, the versions are included:

# Add new SM Versions here as devices with new Compute Capability are released

SM_VERSIONS := sm_10 sm_11 sm_12 sm_13

Is this perhaps a different reference to the versions?

popleaf1 · March 5, 2010, 9:03am

there is a clear error in your global fuction.

global void square_array(float *a, int N)
{
int idx = blockIdx.x * blockDim.x + threadIdx.x;
a[idx] = a[idx] * a[idx];
}

the 2th para is not used, so if the number of thread is not equal to N, your result is of course error.

popleaf1 · March 5, 2010, 9:14am

#include <stdio.h>
#include <cuda.h>

// Kernel that executes on the CUDA device
global void square_array(float *a, int N)
{
int idx = blockIdx.x * blockDim.x + threadIdx.x;

float temp = 0.0f;
for(int i = idx; i < N; i += blockDim.x * gridDim.x)
{
	temp = a[idx];
	a[idx] = temp * temp;
}

}

// main routine that executes on the host
int main(void)
{
const int N = 21; // Number of elements in arrays
float *a_h=new float[N];
float *a_d; // Pointer to host & device arrays
size_t size = N * sizeof(float);

// a_h = (float *)malloc(size); // Allocate array on host
cudaMalloc((void **) &a_d, size); // Allocate array on device
// Initialize host array and copy it to CUDA device
for (int i=0; i<N; i++)
    a_h[i] = (float)i*2;
cudaMemcpy(a_d, a_h, size, cudaMemcpyHostToDevice);
// Do calculation on device:
int nblocksize=512; //I edit here. or 512
int Nblocks=(N - 1)/nblocksize + 1; //I also edit here.
square_array <<< Nblocks, nblocksize >>> (a_d, N);
// Retrieve result from device and store it in host array
cudaMemcpy(a_h, a_d, size, cudaMemcpyDeviceToHost);
// Print results
for (int i=0; i<N; i++)
    printf("%d %f\n", i, a_h[i]);
// Cleanup
delete [] a_h;
cudaFree(a_d);

}

Topic		Replies	Views
Using Double Precision with GTX 260 double precision issue CUDA Programming and Performance	1	8058	July 30, 2009
Wrong output in double precision CUDA Programming and Performance	20	5913	May 20, 2011
problem with double precision unpredictable results Different run give differents errors or no error CUDA Programming and Performance	12	2802	September 10, 2010
1080 does not support doubles? CUDA Programming and Performance	5	553	October 30, 2018
Problem with double precision matrix float works, double doesn't CUDA Programming and Performance	7	3915	April 22, 2009
This is driving me nuts! memory access problem.. CUDA Programming and Performance	5	2662	December 7, 2007
The kernel isn't working CUDA Programming and Performance	9	1050	January 19, 2011
Double Precision Help... Double precision CUDA Programming and Performance	6	5078	September 1, 2011
CUDA double/float woes CUDA not denoting double prec types? CUDA Programming and Performance	1	5913	May 1, 2008
Matrix Multiplication Buggy CUDA Programming and Performance	13	5235	May 5, 2010

Double precision in CUDA 2.3

Related topics