Hi to everyone,
I am new to CUDA computing and I would like to ask a question the more experienced members about double precision in the CUDA 2.3 environment. First of all I have a GTX295 card which, according to the CUDA programming guide, supports compute capability 1.3, meaning that it supports double precision numbers. Also I am using the nvcc compiler through the Microsoft developer studio 2008 (actually the code is built using a one of the SDK sample programs as a template).
So I was experiencing with a simple code, which can be found here ( [url=“My first CUDA program! | /// Parallel Panorama ///”]http://llpanorama.wordpress.com/2008/05/21...t-cuda-program/[/url] ), I have modified it a bit as following (in short I have changed the way the matrix a_h is allocated and the array elements that have to be calculated):
#include <stdio.h>
#include <cuda.h>
// Kernel that executes on the CUDA device
global void square_array(float *a, int N)
{
int idx = blockIdx.x * blockDim.x + threadIdx.x;
a[idx] = a[idx] * a[idx];
}
// main routine that executes on the host
int main(void)
{
float *a_h=new float 60;
float *a_d; // Pointer to host & device arrays
const int N = 60; // Number of elements in arrays
size_t size = N * sizeof(float);
// a_h = (float )malloc(size); // Allocate array on host
cudaMalloc((void **) &a_d, size); // Allocate array on device
// Initialize host array and copy it to CUDA device
for (int i=0; i<N; i++) a_h[i] = (float)i2;
cudaMemcpy(a_d, a_h, size, cudaMemcpyHostToDevice);
// Do calculation on device:
int nblocksize=500;
int Nblocks=N/nblocksize+ (N%nblocksize == 0 ? 0 : 1);
square_array <<< Nblocks, nblocksize >>> (a_d, N);
// Retrieve result from device and store it in host array
cudaMemcpy(a_h, a_d, size, cudaMemcpyDeviceToHost);
// Print results
for (int i=0; i<N; i++) printf(“%d %f\n”, i, a_h[i]);
// Cleanup
delete a_h; cudaFree(a_d);
}
So when I run the executable the program works as it should since it gives the correct results. But when I replace all float declarations to double, then I get a warning :
warning : Double is not supported. Demoting to float
Also the results are wrong (to be precise it is like the array elements were never processed by the device, since what is printed is just the elements intialized by the host). Why do I get the warning and the error results ? Do I miss anything here?
Thanks in advance.