Using a sinusoid input

Hello all,

I’m trying to use a sinusoid as an input to a CUDA program I wrote but am having problems with the data types.

[codebox]

cuFloatComplex *alldata;

cudaMallocHost((void **)&alldata, inputLength * sizeof(cuFloatComplex));

for(int i = 0; i < inputLength; i++) { // CREATE ALL DATA, pad_length of zeros

n = (n+rand()/RAND_MAX-1/2)/2;

alldata[i].x = cos(2*3.1415926*(100/1000)*i);

alldata[i].y = sin(2*3.1415926*(100/1000)*i);

}

cuFloatComplex* totalOutput_d;

CUDA_SAFE_CALL(cudaMalloc((void**)&totalOutput_d, inputLength * sizeof(cuFloatComplex)));

cudaMemcpy(totalOutput_d, alldata, input_data_size*sizeof(cuFloatComplex), cudaMemcpyHostToDevice);

cufftComplex* output_h;

CUDA_SAFE_CALL(cudaMallocHost((void **)&output_h, input_data_size * sizeof(cufftComplex)));

cudaMemcpy(output_h, totalOutput_d, input_data_size * sizeof(cufftComplex), cudaMemcpyDeviceToHost);

for(int j=0; j<input_data_size; j++){

printf("%d: %f\n",j,output_h[j].x);

}

[/codebox]

Unfortunately, my output is a series of 1’s input_data_size elements long. Obviously, these values should be decimals - so some kind of truncation is taking place.

I’m assuming it’s due to the data type cuFloatComplex… any suggestions?

Thanks in advance! :)

For anyone wondering the answer, I figured it out:

alldata[i].x = cos(23.1415926(100/1000)*i)

needs to be

alldata[i].x = cos(23.1415926((float)100/1000)*(float)i)

A suggestion - you might be better off if you write

cosf(2.0f*3.1415926f*(0.1f)*i);

instead.

Note using cosf (cos is for double precision), appending “f” after literals (to mark they are single precision) and inserting 0.1f instead of 100/1000. IIRC cuFloatComplex is just a struct with two floats, no need to use double precision arithmetics.

The compiler might be smart enough to compute hard-coded literals but I wouldn’t trust it, especially with an expensive division. Frankly I’m not sure if compilers are allowed to substitute literal computations on floating point numbers with precomputed evaluations because the exact output may depend on the FPU used. Could anyone comment on this?

By the way, is this a toy program or something that’s supposed to be used in a real app? Because, you know, you can generate those cosines on the device, skipping host array initialization, hostToDevice copy and some global memory reads.